Εμφάνιση απλής εγγραφής

dc.creatorKoumparoulis A., Potamianos G., Thomas S., da Silva Morais E.en
dc.date.accessioned2023-01-31T08:45:25Z
dc.date.available2023-01-31T08:45:25Z
dc.date.issued2021
dc.identifier10.23919/EUSIPCO54536.2021.9616215
dc.identifier.isbn9789082797060
dc.identifier.issn22195491
dc.identifier.urihttp://hdl.handle.net/11615/75305
dc.description.abstractIn this paper, we consider the problem of resource-efficient architectures for audio-visual automatic speech recognition (AVSR). Specifically, we complement our earlier work that introduced efficient convolutional neural networks (CNNs) for visual-only speech recognition, by focusing here on the sequence modeling component of the architecture, proposing a novel resource-efficient time-delay neural network (TDNN) that we extend for AVSR. In more detail, we introduce the sTDNN-F module, which combines the factored TDNN (TDNN-F) with grouped fully-connected layers and the shuffle operation. We then develop an AVSR system based on the sTDNN-F, incorporating the efficient CNNs of our earlier work and other standard visual processing and speech recognition modules. We evaluate our approach on the popular TCD-TIMIT corpus, under two speaker-independent training/testing scenarios. Our best sTDNN-F based AVSR system turns out 74% more efficient than a traditional TDNN one and 35% more efficient than TDNN-F, while maintaining similar recognition accuracy and noise robustness, and also significantly outperforming its audio-only counterpart. © 2021 European Signal Processing Conference. All rights reserved.en
dc.language.isoenen
dc.sourceEuropean Signal Processing Conferenceen
dc.source.urihttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85123160449&doi=10.23919%2fEUSIPCO54536.2021.9616215&partnerID=40&md5=450341a7da64e25d2560fcb5babbe19a
dc.subjectAudio acousticsen
dc.subjectConvolutional neural networksen
dc.subjectNetwork architectureen
dc.subjectSpeech recognitionen
dc.subjectAudio-visualen
dc.subjectAudio-visual automatic speech recognitionen
dc.subjectAudiovisual speech recognitionen
dc.subjectAutomatic speech recognitionen
dc.subjectAutomatic speech recognition systemen
dc.subjectConvolutional neural networken
dc.subjectMobilipneten
dc.subjectNeural network architectureen
dc.subjectResource-efficienten
dc.subjectTime delay neural networksen
dc.subjectComputational efficiencyen
dc.subjectEuropean Signal Processing Conference, EUSIPCOen
dc.titleResource-efficient TDNN Architectures for Audio-visual Speech Recognitionen
dc.typeconferenceItemen


Αρχεία σε αυτό το τεκμήριο

ΑρχείαΜέγεθοςΤύποςΠροβολή

Δεν υπάρχουν αρχεία που να σχετίζονται με αυτό το τεκμήριο.

Αυτό το τεκμήριο εμφανίζεται στις ακόλουθες συλλογές

Εμφάνιση απλής εγγραφής