Multi-room speech activity detection using a distributed microphone network in domestic environments
Συγγραφέας
Giannoulis P., Brutti A., Matassoni M., Abad A., Katsamanis A., Matos M., Potamianos G., Maragos P.Ημερομηνία
2015Γλώσσα
en
Λέξη-κλειδί
Επιτομή
Domestic environments are particularly challenging for distant speech recognition: reverberation, background noise and interfering sources, as well as the propagation of acoustic events across adjacent rooms, critically degrade the performance of standard speech processing algorithms. In this application scenario, a crucial task is the detection and localization of speech events generated by users within the various rooms. A specific challenge of multi-room environments is the inter-room interference that negatively affects speech activity detectors. In this paper, we present and compare different solutions for the multi-room speech activity detection task. The combination of a model-based room-independent speech activity detection module with a room-dependent inside/outside classification stage, based on specific features, provides satisfactory performance. The proposed methods are evaluated on a multi-room, multi-channel corpus, where spoken commands and other typical acoustic events occur in different rooms. © 2015 EURASIP.
Collections
Related items
Showing items related by title, author, creator and subject.
-
ATHENA: A Greek multi-sensory database for home automation control
Tsiami, A.; Rodomagoulakis, I.; Giannoulis, P.; Katsamanis, A.; Potamianos, G.; Maragos, P. (2014)In this paper we present a Greek speech database with real multi-modal data in a smart home two-room environment. In total, 20 speakers were recorded in 240 one-minute long sessions. The recordings include utterances of ... -
Audio-visual speech recognition using depth information from the Kinect in noisy video conditions
Galatas, G.; Potamianos, G.; Makedon, F. (2012)In this paper we build on our recent work, where we successfully incorporated facial depth data of a speaker captured by the Microsoft Kinect device, as a third data stream in an audio-visual automatic speech recognizer. ... -
Multimodal fusion and sequence learning for cued speech recognition from videos
Papadimitriou K., Parelli M., Sapountzaki G., Pavlakos G., Maragos P., Potamianos G. (2021)Cued Speech (CS) constitutes a non-vocal mode of communication that relies on lip movements in conjunction with hand positional and gestural cues, in order to disambiguate phonetic information and make it accessible to the ...