• English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • français 
    • English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • Ouvrir une session
Voir le document 
  •   Accueil de DSpace
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • Voir le document
  •   Accueil de DSpace
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • Voir le document
JavaScript is disabled for your browser. Some features of this site may not work without it.
Tout DSpace
  • Communautés & Collections
  • Par date de publication
  • Auteurs
  • Titres
  • Sujets

Resource-efficient TDNN Architectures for Audio-visual Speech Recognition

Thumbnail
Auteur
Koumparoulis A., Potamianos G., Thomas S., da Silva Morais E.
Date
2021
Language
en
DOI
10.23919/EUSIPCO54536.2021.9616215
Sujet
Audio acoustics
Convolutional neural networks
Network architecture
Speech recognition
Audio-visual
Audio-visual automatic speech recognition
Audiovisual speech recognition
Automatic speech recognition
Automatic speech recognition system
Convolutional neural network
Mobilipnet
Neural network architecture
Resource-efficient
Time delay neural networks
Computational efficiency
European Signal Processing Conference, EUSIPCO
Afficher la notice complète
Résumé
In this paper, we consider the problem of resource-efficient architectures for audio-visual automatic speech recognition (AVSR). Specifically, we complement our earlier work that introduced efficient convolutional neural networks (CNNs) for visual-only speech recognition, by focusing here on the sequence modeling component of the architecture, proposing a novel resource-efficient time-delay neural network (TDNN) that we extend for AVSR. In more detail, we introduce the sTDNN-F module, which combines the factored TDNN (TDNN-F) with grouped fully-connected layers and the shuffle operation. We then develop an AVSR system based on the sTDNN-F, incorporating the efficient CNNs of our earlier work and other standard visual processing and speech recognition modules. We evaluate our approach on the popular TCD-TIMIT corpus, under two speaker-independent training/testing scenarios. Our best sTDNN-F based AVSR system turns out 74% more efficient than a traditional TDNN one and 35% more efficient than TDNN-F, while maintaining similar recognition accuracy and noise robustness, and also significantly outperforming its audio-only counterpart. © 2021 European Signal Processing Conference. All rights reserved.
URI
http://hdl.handle.net/11615/75305
Collections
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19735]

Related items

Showing items related by title, author, creator and subject.

  • Thumbnail

    Deep View2View Mapping for View-Invariant Lipreading 

    Koumparoulis A., Potamianos G. (2019)
    Recently, visual-only and audio-visual speech recognition have made significant progress thanks to deep-learning based, trainable visual front-ends (VFEs), with most research focusing on frontal or near-frontal face videos. ...
  • Thumbnail

    Multimodal fusion and sequence learning for cued speech recognition from videos 

    Papadimitriou K., Parelli M., Sapountzaki G., Pavlakos G., Maragos P., Potamianos G. (2021)
    Cued Speech (CS) constitutes a non-vocal mode of communication that relies on lip movements in conjunction with hand positional and gestural cues, in order to disambiguate phonetic information and make it accessible to the ...
  • Thumbnail

    Resource-adaptive deep learning for visual speech recognition 

    Koumparoulis A., Potamianos G., Thomas S., da Silva Morais E. (2020)
    We focus on the problem of efficient architectures for lipreading that allow trading-off computational resources for visual speech recognition accuracy. In particular, we make two contributions: First, we introduce ...
htmlmap 

 

Parcourir

Tout DSpaceCommunautés & CollectionsPar date de publicationAuteursTitresSujetsCette collectionPar date de publicationAuteursTitresSujets

Mon compte

Ouvrir une sessionS'inscrire
Help Contact
DepositionAboutHelpContactez-nous
Choose LanguageTout DSpace
EnglishΕλληνικά
htmlmap