Mostra i principali dati dell'item

dc.creatorMarcheret E., Potamianos G., Vopicka J., Goel V.en
dc.date.accessioned2023-01-31T08:57:18Z
dc.date.available2023-01-31T08:57:18Z
dc.date.issued2015
dc.identifier.urihttp://hdl.handle.net/11615/76340
dc.description.abstractAppearance-based feature extraction constitutes the dominant approach for visual speech representation in a variety of problems, such as automatic speechreading, visual speech detection, and others. To obtain the necessary visual features, typically a rectangular region-of-interest (ROI) containing the speaker’s mouth is first extracted, followed, most commonly, by a discrete cosine transform (DCT) of the ROI pixel values and a feature selection step. The approach, although algorithmically simple and computationally efficient, suffers from lack of DCT invariance to typical ROI deformations, stemming, primarily, from speaker’s head pose variability and small tracking inaccuracies. To address the problem, in this paper, the recently introduced scattering transform is investigated as an alternative to DCT within the appearance-based framework for ROI representation, suitable for visual speech applications. A number of such tasks are considered, namely, visual-only speech activity detection, visual-only and audio-visual sub-phonetic classification, as well as audio-visual speech synchrony detection, all employing deep neural network classifiers with either DCT or scattering-based visual features. Comparative experiments of the resulting systems are conducted on a large audio-visual corpus of frontal face videos, demonstrating, in all cases, the scattering transform superiority over the DCT. © 2015 Auditory-Visual Speech Processing 2015, AVSP 2015, held in conjunction with Facial Analysis and Animation, FAA 2015 - 1st Joint Conference on Facial Analysis, Animation, and Auditory-Visual Speech Processing, FAAVSP 2015. All rights reserved.en
dc.language.isoenen
dc.sourceAuditory-Visual Speech Processing 2015, AVSP 2015, held in conjunction with Facial Analysis and Animation, FAA 2015 - 1st Joint Conference on Facial Analysis, Animation, and Auditory-Visual Speech Processing, FAAVSP 2015en
dc.source.urihttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85016058046&partnerID=40&md5=1b72cfc040a8fc66cdbcc0776f93f4a6
dc.subjectAudio systemsen
dc.subjectDeep neural networksen
dc.subjectFeature extractionen
dc.subjectImage segmentationen
dc.subjectSpeech processingen
dc.subjectSpeech recognitionen
dc.subjectAudio-visualen
dc.subjectAudio-visual synchronyen
dc.subjectAutomatic speechreadingen
dc.subjectRegion-of-interesten
dc.subjectRegions of interesten
dc.subjectScattering transformsen
dc.subjectSpeech activity detectionsen
dc.subjectSpeechreadingen
dc.subjectVisual speechen
dc.subjectVisual speech activity detectionen
dc.subjectDiscrete cosine transformsen
dc.subjectThe International Society for Computers and Their Applications (ISCA)en
dc.titleScattering vs. Discrete Cosine Transform Features in Visual Speech Processingen
dc.typeconferenceItemen


Files in questo item

FilesDimensioneFormatoMostra

Nessun files in questo item.

Questo item appare nelle seguenti collezioni

Mostra i principali dati dell'item