MobilipNet: Resource-efficient deep learning based lipreading

Koumparoulis A., Potamianos G.

dc.creator	Koumparoulis A., Potamianos G.	en
dc.date.accessioned	2023-01-31T08:45:24Z
dc.date.available	2023-01-31T08:45:24Z
dc.date.issued	2019
dc.identifier	10.21437/Interspeech.2019-2618
dc.identifier.issn	2308457X
dc.identifier.uri	http://hdl.handle.net/11615/75303
dc.description.abstract	Recent works in visual speech recognition utilize deep learning advances to improve accuracy. Focus however has been primarily on recognition performance, while ignoring the computational burden of deep architectures. In this paper we address these issues concurrently, aiming at both high computational efficiency and recognition accuracy in lipreading. For this purpose, we investigate the MobileNet convolutional neural network architectures, recently proposed for image classification. In addition, we extend the 2D convolutions of MobileNets to 3D ones, in order to better model the spatio-temporal nature of the lipreading problem. We investigate two architectures in this extension, introducing the temporal dimension as part of either the depthwise or the pointwise MobileNet convolutions. To further boost computational efficiency, we also consider using pointwise convolutions alone, as well as networks operating on half the mouth region. We evaluate the proposed architectures on speaker-independent visual-only continuous speech recognition on the popular TCD-TIMIT corpus. Our best system outperforms a baseline CNN by 4.27% absolute in word error rate and over 12 times in computational efficiency, whereas, compared to a state-of-the-art ResNet, it is 37 times more efficient at a minor 0.07% absolute error rate degradation. Copyright © 2019 ISCA	en
dc.language.iso	en	en
dc.source	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH	en
dc.source.uri	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85074713343&doi=10.21437%2fInterspeech.2019-2618&partnerID=40&md5=0fc8941672ba82ab55dce1b938093057
dc.subject	International Speech Communication Association	en
dc.title	MobilipNet: Resource-efficient deep learning based lipreading	en
dc.type	conferenceItem	en

Fichier(s) constituant ce document

Fichiers	Taille	Format	Vue
Il n'y a pas de fichiers associés à ce document.

Ce document figure dans la(les) collection(s) suivante(s)

Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19705]

Afficher la notice abrégée