ATHENA: A Greek multi-sensory database for home automation control

In this paper we present a Greek speech database with real multi-modal data in a smart home two-room environment. In total, 20 speakers were recorded in 240 one-minute long sessions. The recordings include utterances of activation keywords and commands for home automation control, but also phonetically rich sentences and conversational speech. Audio, speaker movements and gestures were captured by 20 condenser microphones installed on the walls and ceiling, 6 MEMS microphones, 2 close-talk microphones and one Kinect camera. The new publicly available database exhibits adverse noise conditions because of background noises and acoustic events performed during the recordings to better approximate a realistic everyday home scenario. Thus, it is suitable for experimentation on voice activity and event detection, source localization, speech enhancement and far-field speech recognition. We present the details of the corpus as well as baseline results on multi-channel voice activity detection and spoken command recognition. Copyright © 2014 ISCA.

URI

http://hdl.handle.net/11615/33882

Collections

Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19735]

ATHENA: A Greek multi-sensory database for home automation control

Συγγραφέας

Ημερομηνία

Λέξη-κλειδί

Επιτομή

URI

Collections

Related items

Audio-visual speech recognition using depth information from the Kinect in noisy video conditions ﻿

Multi-room speech activity detection using a distributed microphone network in domestic environments ﻿

Multimodal fusion and sequence learning for cued speech recognition from videos ﻿

Audio-visual speech recognition using depth information from the Kinect in noisy video conditions

Multi-room speech activity detection using a distributed microphone network in domestic environments

Multimodal fusion and sequence learning for cued speech recognition from videos