Audio-visual speech recognition using depth information from the Kinect in noisy video conditions
dc.creator | Galatas, G. | en |
dc.creator | Potamianos, G. | en |
dc.creator | Makedon, F. | en |
dc.date.accessioned | 2015-11-23T10:26:54Z | |
dc.date.available | 2015-11-23T10:26:54Z | |
dc.date.issued | 2012 | |
dc.identifier | 10.1145/2413097.2413100 | |
dc.identifier.isbn | 9781450313001 | |
dc.identifier.uri | http://hdl.handle.net/11615/27629 | |
dc.description.abstract | In this paper we build on our recent work, where we successfully incorporated facial depth data of a speaker captured by the Microsoft Kinect device, as a third data stream in an audio-visual automatic speech recognizer. In particular, we focus our interest on whether the depth stream provides sufficient speech information that can improve system robustness to noisy audio-visual conditions, thus studying system operation beyond the traditional scenarios, where noise is applied to the audio signal alone. For this purpose, we consider four realistic visual modality degradations at various noise levels, and we conduct small-vocabulary recognition experiments on an appropriate, previously collected, audiovisual database. Our results demonstrate improved system performance due to the depth modality, as well as considerable accuracy increase, when using both the visual and depth modalities over audio only speech recognition. | en |
dc.source.uri | http://www.scopus.com/inward/record.url?eid=2-s2.0-84871979378&partnerID=40&md5=1c3fed620a063e661a88537e93fab25d | |
dc.subject | Audio-visual speech recognition | en |
dc.subject | Depth information | en |
dc.subject | Microsoft Kinect | en |
dc.subject | Video noise | en |
dc.subject | Audio signal | en |
dc.subject | Audio visual speech recognition | en |
dc.subject | Audio-visual | en |
dc.subject | Audio-visual database | en |
dc.subject | Automatic speech recognizers | en |
dc.subject | Data stream | en |
dc.subject | MicroSoft | en |
dc.subject | Noise levels | en |
dc.subject | Speech information | en |
dc.subject | System operation | en |
dc.subject | System robustness | en |
dc.subject | Visual modalities | en |
dc.subject | Acoustic noise | en |
dc.subject | Audio acoustics | en |
dc.subject | Speech recognition | en |
dc.title | Audio-visual speech recognition using depth information from the Kinect in noisy video conditions | en |
dc.type | conferenceItem | en |
Αρχεία σε αυτό το τεκμήριο
Αρχεία | Μέγεθος | Τύπος | Προβολή |
---|---|---|---|
Δεν υπάρχουν αρχεία που να σχετίζονται με αυτό το τεκμήριο. |