Εμφάνιση απλής εγγραφής

dc.creatorKoumparoulis A., Potamianos G., Thomas S., da Silva Morais E.en
dc.date.accessioned2023-01-31T08:45:26Z
dc.date.available2023-01-31T08:45:26Z
dc.date.issued2020
dc.identifier10.21437/Interspeech.2020-3003
dc.identifier.issn2308457X
dc.identifier.urihttp://hdl.handle.net/11615/75307
dc.description.abstractWe focus on the problem of efficient architectures for lipreading that allow trading-off computational resources for visual speech recognition accuracy. In particular, we make two contributions: First, we introduce MobiLipNetV3, an efficient and accurate lipreading model, based on our earlier work on MobiLipNetV2 and incorporating recent advances in convolutional neural network architectures. Second, we propose a novel recognition paradigm, called MultiRate Ensemble (MRE), that combines a “lean” and a “full” MobiLipNetV3 in the lipreading pipeline, with the latter applied at a lower frame rate. This architecture yields a family of systems offering multiple accuracy vs. efficiency operating points depending on the frame-rate decimation of the “full” model, thus allowing adaptation to the available device resources. We evaluate our approach on the TCD-TIMIT corpus, popular in speaker-independent lipreading of continuous speech. The proposed MRE family of systems can be up to 73 times more efficient compared to residual neural network based lipreading, and up to twice as MobiLipNetV2, while in both cases reaching up to 8% absolute WER reduction, depending on the MRE chosen operating point. For example, a temporal decimation of three yields a 7% absolute WER reduction and a 26% relative decrease in computations over MobiLipNetV2. © 2020 ISCAen
dc.language.isoenen
dc.sourceProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECHen
dc.source.urihttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85098111271&doi=10.21437%2fInterspeech.2020-3003&partnerID=40&md5=3756544ba0d426bed39827899c5d31fa
dc.subjectConvolutional neural networksen
dc.subjectDeep learningen
dc.subjectNetwork architectureen
dc.subjectSpeech communicationen
dc.subjectComputational resourcesen
dc.subjectContinuous speechen
dc.subjectDevice resourcesen
dc.subjectEfficient architectureen
dc.subjectFrame rateen
dc.subjectOperating pointsen
dc.subjectSpeaker independentsen
dc.subjectVisual speech recognitionen
dc.subjectSpeech recognitionen
dc.subjectInternational Speech Communication Associationen
dc.titleResource-adaptive deep learning for visual speech recognitionen
dc.typeconferenceItemen


Αρχεία σε αυτό το τεκμήριο

ΑρχείαΜέγεθοςΤύποςΠροβολή

Δεν υπάρχουν αρχεία που να σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στις ακόλουθες συλλογές

Εμφάνιση απλής εγγραφής