ACCURATE AND RESOURCE-EFFICIENT LIPREADING WITH EFFICIENTNETV2 AND TRANSFORMERS

Koumparoulis A., Potamianos G.

dc.creator	Koumparoulis A., Potamianos G.	en
dc.date.accessioned	2023-01-31T08:45:23Z
dc.date.available	2023-01-31T08:45:23Z
dc.date.issued	2022
dc.identifier	10.1109/ICASSP43922.2022.9747729
dc.identifier.isbn	9781665405409
dc.identifier.issn	15206149
dc.identifier.uri	http://hdl.handle.net/11615/75301
dc.description.abstract	We present a novel resource-efficient end-to-end architecture for lipreading that achieves state-of-the-art results on a popular and challenging benchmark. In particular, we make the following contributions: First, inspired by the recent success of the EfficientNet architecture in image classification and our earlier work on resource-efficient lipreading models (MobiLipNet), we introduce EfficientNets to the lipreading task. Second, we show that the currently most popular in the literature 3D front-end contains a max-pool layer that prohibits networks from reaching superior performance and propose its removal. Finally, we improve our system's back-end robustness by including a Transformer encoder. We evaluate our proposed system on the “Lipreading In-The-Wild” (LRW) corpus, a database containing short video segments from BBC TV broadcasts. The proposed network (T-variant) attains 88.53% word accuracy, a 0.17% absolute improvement over the current state-of-the-art, while being five times less computationally intensive. Further, an up-scaled version of our model (L-variant) achieves 89.52%, a new state-of-the-art result on the LRW corpus. © 2022 IEEE	en
dc.language.iso	en	en
dc.source	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings	en
dc.source.uri	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85134038644&doi=10.1109%2fICASSP43922.2022.9747729&partnerID=40&md5=8e52210a1b9b3f99a047dd008af132de
dc.subject	Computer vision	en
dc.subject	Network layers	en
dc.subject	Efficientnet	en
dc.subject	End to end	en
dc.subject	Front end	en
dc.subject	Images classification	en
dc.subject	Lipreading	en
dc.subject	Performance	en
dc.subject	Resource-efficient	en
dc.subject	State of the art	en
dc.subject	Transformer	en
dc.subject	Video segments	en
dc.subject	Network architecture	en
dc.subject	Institute of Electrical and Electronics Engineers Inc.	en
dc.title	ACCURATE AND RESOURCE-EFFICIENT LIPREADING WITH EFFICIENTNETV2 AND TRANSFORMERS	en
dc.type	conferenceItem	en

Ficheros en el ítem

Ficheros	Tamaño	Formato	Ver
No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19705]

Mostrar el registro sencillo del ítem