• English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • Deutsch 
    • English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • Einloggen
Dokumentanzeige 
  •   DSpace Startseite
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • Dokumentanzeige
  •   DSpace Startseite
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • Dokumentanzeige
JavaScript is disabled for your browser. Some features of this site may not work without it.
Gesamter Bestand
  • Bereiche & Sammlungen
  • Erscheinungsdatum
  • Autoren
  • Titeln
  • Schlagworten

Resource-adaptive deep learning for visual speech recognition

Thumbnail
Autor
Koumparoulis A., Potamianos G., Thomas S., da Silva Morais E.
Datum
2020
Language
en
DOI
10.21437/Interspeech.2020-3003
Schlagwort
Convolutional neural networks
Deep learning
Network architecture
Speech communication
Computational resources
Continuous speech
Device resources
Efficient architecture
Frame rate
Operating points
Speaker independents
Visual speech recognition
Speech recognition
International Speech Communication Association
Zur Langanzeige
Zusammenfassung
We focus on the problem of efficient architectures for lipreading that allow trading-off computational resources for visual speech recognition accuracy. In particular, we make two contributions: First, we introduce MobiLipNetV3, an efficient and accurate lipreading model, based on our earlier work on MobiLipNetV2 and incorporating recent advances in convolutional neural network architectures. Second, we propose a novel recognition paradigm, called MultiRate Ensemble (MRE), that combines a “lean” and a “full” MobiLipNetV3 in the lipreading pipeline, with the latter applied at a lower frame rate. This architecture yields a family of systems offering multiple accuracy vs. efficiency operating points depending on the frame-rate decimation of the “full” model, thus allowing adaptation to the available device resources. We evaluate our approach on the TCD-TIMIT corpus, popular in speaker-independent lipreading of continuous speech. The proposed MRE family of systems can be up to 73 times more efficient compared to residual neural network based lipreading, and up to twice as MobiLipNetV2, while in both cases reaching up to 8% absolute WER reduction, depending on the MRE chosen operating point. For example, a temporal decimation of three yields a 7% absolute WER reduction and a 26% relative decrease in computations over MobiLipNetV2. © 2020 ISCA
URI
http://hdl.handle.net/11615/75307
Collections
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19735]

Verwandte Dokumente

Anzeige der Dokumente mit ähnlichem Titel, Autor, Urheber und Thema.

  • Thumbnail

    ATHENA: A Greek multi-sensory database for home automation control 

    Tsiami, A.; Rodomagoulakis, I.; Giannoulis, P.; Katsamanis, A.; Potamianos, G.; Maragos, P. (2014)
    In this paper we present a Greek speech database with real multi-modal data in a smart home two-room environment. In total, 20 speakers were recorded in 240 one-minute long sessions. The recordings include utterances of ...
  • Thumbnail

    Audio-visual speech recognition using depth information from the Kinect in noisy video conditions 

    Galatas, G.; Potamianos, G.; Makedon, F. (2012)
    In this paper we build on our recent work, where we successfully incorporated facial depth data of a speaker captured by the Microsoft Kinect device, as a third data stream in an audio-visual automatic speech recognizer. ...
  • Thumbnail

    Multi-room speech activity detection using a distributed microphone network in domestic environments 

    Giannoulis P., Brutti A., Matassoni M., Abad A., Katsamanis A., Matos M., Potamianos G., Maragos P. (2015)
    Domestic environments are particularly challenging for distant speech recognition: reverberation, background noise and interfering sources, as well as the propagation of acoustic events across adjacent rooms, critically ...
htmlmap 

 

Stöbern

Gesamter BestandBereiche & SammlungenErscheinungsdatumAutorenTitelnSchlagwortenDiese SammlungErscheinungsdatumAutorenTitelnSchlagworten

Mein Benutzerkonto

EinloggenRegistrieren
Help Contact
DepositionAboutHelpKontakt
Choose LanguageGesamter Bestand
EnglishΕλληνικά
htmlmap