Εμφάνιση απλής εγγραφής

dc.creatorRodomagoulakis I., Katsamanis A., Potamianos G., Giannoulis P., Tsiami A., Maragos P.en
dc.date.accessioned2023-01-31T09:51:40Z
dc.date.available2023-01-31T09:51:40Z
dc.date.issued2017
dc.identifier10.1016/j.csl.2017.02.004
dc.identifier.issn08852308
dc.identifier.urihttp://hdl.handle.net/11615/78534
dc.description.abstractThe paper focuses on the design of a practical system pipeline for always-listening, far-field spoken command recognition in everyday smart indoor environments that consist of multiple rooms equipped with sparsely distributed microphone arrays. Such environments, for example domestic and multi-room offices, present challenging acoustic scenes to state-of-the-art speech recognizers, especially under always-listening operation, due to low signal-to-noise ratios, frequent overlaps of target speech, acoustic events, and background noise, as well as inter-room interference and reverberation. In addition, recognition of target commands often needs to be accompanied by their spatial localization, at least at the room level, to account for users in different rooms, providing command disambiguation and room-localized feedback. To address the above requirements, the use of parallel recognition pipelines is proposed, one per room of interest. The approach is enabled by a room-dependent speech activity detection module that employs appropriate multichannel features to determine speech segments and their room of origin, feeding them to the corresponding room-dependent pipelines for further processing. These consist of the traditional cascade of far-field spoken command detection and recognition, the former based on the detection of “activating” key-phrases. Robustness to the challenging environments is pursued by a number of multichannel combination and acoustic modeling techniques, thoroughly investigated in the paper. In particular, channel selection, beamforming, and decision fusion of single-channel results are considered, with the latter performing best. Additional gains are observed, when the employed acoustic models are trained on appropriately simulated reverberant and noisy speech data, and are channel-adapted to the target environments. Further issues investigated concern the inter-dependencies of the various system components, demonstrating the superiority of joint optimization of the component tunable parameters over their separate or sequential optimization. The proposed approach is developed for the Greek language, exhibiting promising performance in real recordings in a four-room apartment, as well as a two-room office. For example, in the latter, a 76.6% command recognition accuracy is achieved on a speaker-independent test, employing a 180-sentence decoding grammar. This result represents a 46% relative improvement over conventional beamforming. © 2017 Elsevier Ltden
dc.language.isoenen
dc.sourceComputer Speech and Languageen
dc.source.urihttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85019697973&doi=10.1016%2fj.csl.2017.02.004&partnerID=40&md5=d1cff06de05ad837190b20adca3301d7
dc.subjectAcoustic noiseen
dc.subjectArchitectural acousticsen
dc.subjectAutomationen
dc.subjectBeamformingen
dc.subjectIntelligent buildingsen
dc.subjectMicrophonesen
dc.subjectPipeline processing systemsen
dc.subjectPipelinesen
dc.subjectReverberationen
dc.subjectSignal to noise ratioen
dc.subjectSpeechen
dc.subjectSpeech communicationen
dc.subjectChannel selectionen
dc.subjectDecision fusionen
dc.subjectDistant speech recognitionen
dc.subjectKeyword spottingen
dc.subjectMultichannel processingen
dc.subjectSmart homesen
dc.subjectSpeech activity detectionsen
dc.subjectSpeech recognitionen
dc.subjectAcademic Pressen
dc.titleRoom-localized spoken command recognition in multi-room, multi-microphone environmentsen
dc.typejournalArticleen


Αρχεία σε αυτό το τεκμήριο

ΑρχείαΜέγεθοςΤύποςΠροβολή

Δεν υπάρχουν αρχεία που να σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στις ακόλουθες συλλογές

Εμφάνιση απλής εγγραφής