Logo
    • English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • English 
    • English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • Login
View Item 
  •   University of Thessaly Institutional Repository
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • View Item
  •   University of Thessaly Institutional Repository
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.
Institutional repository
All of DSpace
  • Communities & Collections
  • By Issue Date
  • Authors
  • Titles
  • Subjects

Resource-efficient TDNN Architectures for Audio-visual Speech Recognition

Thumbnail
Author
Koumparoulis A., Potamianos G., Thomas S., da Silva Morais E.
Date
2021
Language
en
DOI
10.23919/EUSIPCO54536.2021.9616215
Keyword
Audio acoustics
Convolutional neural networks
Network architecture
Speech recognition
Audio-visual
Audio-visual automatic speech recognition
Audiovisual speech recognition
Automatic speech recognition
Automatic speech recognition system
Convolutional neural network
Mobilipnet
Neural network architecture
Resource-efficient
Time delay neural networks
Computational efficiency
European Signal Processing Conference, EUSIPCO
Metadata display
Abstract
In this paper, we consider the problem of resource-efficient architectures for audio-visual automatic speech recognition (AVSR). Specifically, we complement our earlier work that introduced efficient convolutional neural networks (CNNs) for visual-only speech recognition, by focusing here on the sequence modeling component of the architecture, proposing a novel resource-efficient time-delay neural network (TDNN) that we extend for AVSR. In more detail, we introduce the sTDNN-F module, which combines the factored TDNN (TDNN-F) with grouped fully-connected layers and the shuffle operation. We then develop an AVSR system based on the sTDNN-F, incorporating the efficient CNNs of our earlier work and other standard visual processing and speech recognition modules. We evaluate our approach on the popular TCD-TIMIT corpus, under two speaker-independent training/testing scenarios. Our best sTDNN-F based AVSR system turns out 74% more efficient than a traditional TDNN one and 35% more efficient than TDNN-F, while maintaining similar recognition accuracy and noise robustness, and also significantly outperforming its audio-only counterpart. © 2021 European Signal Processing Conference. All rights reserved.
URI
http://hdl.handle.net/11615/75305
Collections
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19735]

Related items

Showing items related by title, author, creator and subject.

  • Thumbnail

    Deep View2View Mapping for View-Invariant Lipreading 

    Koumparoulis A., Potamianos G. (2019)
    Recently, visual-only and audio-visual speech recognition have made significant progress thanks to deep-learning based, trainable visual front-ends (VFEs), with most research focusing on frontal or near-frontal face videos. ...
  • Thumbnail

    Multimodal fusion and sequence learning for cued speech recognition from videos 

    Papadimitriou K., Parelli M., Sapountzaki G., Pavlakos G., Maragos P., Potamianos G. (2021)
    Cued Speech (CS) constitutes a non-vocal mode of communication that relies on lip movements in conjunction with hand positional and gestural cues, in order to disambiguate phonetic information and make it accessible to the ...
  • Thumbnail

    Resource-adaptive deep learning for visual speech recognition 

    Koumparoulis A., Potamianos G., Thomas S., da Silva Morais E. (2020)
    We focus on the problem of efficient architectures for lipreading that allow trading-off computational resources for visual speech recognition accuracy. In particular, we make two contributions: First, we introduce ...
htmlmap 

 

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

LoginRegister (MyDspace)
Help Contact
DepositionAboutHelpContact Us
Choose LanguageAll of DSpace
EnglishΕλληνικά
htmlmap