Εμφάνιση απλής εγγραφής

dc.creatorMarcheret E., Potamianos G., Vopicka J., Goel V.en
dc.date.accessioned2023-01-31T08:57:17Z
dc.date.available2023-01-31T08:57:17Z
dc.date.issued2015
dc.identifier.issn2308457X
dc.identifier.urihttp://hdl.handle.net/11615/76339
dc.description.abstractIn this paper, we address the problem of automatically detecting whether the audio and visual speech modalities in frontal pose videos are synchronous or not. This is of interest in a wide range of applications, for example spoof detection in biometrics, lip-syncing, speaker detection and diarization in multi-subject videos, and video data quality assurance. In our adopted approach, we investigate the use of deep neural networks (DNNs) for this purpose. The proposed synchrony DNNs operate directly on audio and visual features over relatively wide contexts, or, alternatively, on appropriate hidden (bottleneck) or output layers of DNNs trained for single-modal or audio-visual automatic speech recognition. In all cases, the synchrony DNN classes consist of the "in-sync" and a number of "out-of-sync" targets, the latter considered at multiples of ± 30 msec steps of overall asynchrony between the two modalities. We apply the proposed approach on two multi-subject audio-visual databases, one of high-quality data recorded in studio-like conditions, and one of data recorded by smart cell-phone devices. On both sets, and under a speaker-independent experimental framework, we are able to achieve very low equal-error-rates in distinguishing "in-sync" from "out-of-sync" data. Copyright © 2015 ISCA.en
dc.language.isoenen
dc.sourceProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECHen
dc.source.urihttps://www.scopus.com/inward/record.uri?eid=2-s2.0-84959124862&partnerID=40&md5=529ba3e5d1d00730557b7400378af159
dc.subjectMobile phonesen
dc.subjectQuality assuranceen
dc.subjectSpeech communicationen
dc.subjectAudio-visualen
dc.subjectAudio-visual databaseen
dc.subjectAutomatic speech recognitionen
dc.subjectDeep neural networksen
dc.subjectHigh quality dataen
dc.subjectSmart cell phonesen
dc.subjectSpeaker detectionen
dc.subjectSpeaker independentsen
dc.subjectSpeech recognitionen
dc.subjectInternational Speech and Communication Associationen
dc.titleDetecting audio-visual synchrony using deep neural networksen
dc.typeconferenceItemen


Αρχεία σε αυτό το τεκμήριο

ΑρχείαΜέγεθοςΤύποςΠροβολή

Δεν υπάρχουν αρχεία που να σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στις ακόλουθες συλλογές

Εμφάνιση απλής εγγραφής