Room-localized speech activity detection in multi-microphone smart homes

Giannoulis P., Potamianos G., Maragos P.

dc.creator	Giannoulis P., Potamianos G., Maragos P.	en
dc.date.accessioned	2023-01-31T07:42:13Z
dc.date.available	2023-01-31T07:42:13Z
dc.date.issued	2019
dc.identifier	10.1186/s13636-019-0158-8
dc.identifier.issn	16874714
dc.identifier.uri	http://hdl.handle.net/11615/72375
dc.description.abstract	Voice-enabled interaction systems in domestic environments have attracted significant interest recently, being the focus of smart home research projects and commercial voice assistant home devices. Within the multi-module pipelines of such systems, speech activity detection (SAD) constitutes a crucial component, providing input to their activation and speech recognition subsystems. In typical multi-room domestic environments, SAD may also convey spatial intelligence to the interaction, in addition to its traditional temporal segmentation output, by assigning speech activity at the room level. Such room-localized SAD can, for example, disambiguate user command referents, allow localized system feedback, and enable parallel voice interaction sessions by multiple subjects in different rooms. In this paper, we investigate a room-localized SAD system for smart homes equipped with multiple microphones distributed in multiple rooms, significantly extending our earlier work. The system employs a two-stage algorithm, incorporating a set of hand-crafted features specially designed to discriminate room-inside vs. room-outside speech at its second stage, refining SAD hypotheses obtained at its first stage by traditional statistical modeling and acoustic front-end processing. Both algorithmic stages exploit multi-microphone information, combining it at the signal, feature, or decision level. The proposed approach is extensively evaluated on both simulated and real data recorded in a multi-room, multi-microphone smart home, significantly outperforming alternative baselines. Further, it remains robust to reduced microphone setups, while also comparing favorably to deep learning-based alternatives. © 2019, The Author(s).	en
dc.language.iso	en	en
dc.source	Eurasip Journal on Audio, Speech, and Music Processing	en
dc.source.uri	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85071630220&doi=10.1186%2fs13636-019-0158-8&partnerID=40&md5=57b484838f86afd53c59ec2bf9e83a70
dc.subject	Automation	en
dc.subject	Deep learning	en
dc.subject	Intelligent buildings	en
dc.subject	Microphones	en
dc.subject	Refining	en
dc.subject	Speech	en
dc.subject	Active room selection	en
dc.subject	Microphone arrays	en
dc.subject	Multi channel	en
dc.subject	Smart homes	en
dc.subject	Speech activity detections	en
dc.subject	Speech recognition	en
dc.subject	Springer International Publishing	en
dc.title	Room-localized speech activity detection in multi-microphone smart homes	en
dc.type	journalArticle	en

Ficheros en el ítem

Ficheros	Tamaño	Formato	Ver
No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19705]

Mostrar el registro sencillo del ítem