Mostrar el registro sencillo del ítem
ACCURATE AND RESOURCE-EFFICIENT LIPREADING WITH EFFICIENTNETV2 AND TRANSFORMERS
dc.creator | Koumparoulis A., Potamianos G. | en |
dc.date.accessioned | 2023-01-31T08:45:23Z | |
dc.date.available | 2023-01-31T08:45:23Z | |
dc.date.issued | 2022 | |
dc.identifier | 10.1109/ICASSP43922.2022.9747729 | |
dc.identifier.isbn | 9781665405409 | |
dc.identifier.issn | 15206149 | |
dc.identifier.uri | http://hdl.handle.net/11615/75301 | |
dc.description.abstract | We present a novel resource-efficient end-to-end architecture for lipreading that achieves state-of-the-art results on a popular and challenging benchmark. In particular, we make the following contributions: First, inspired by the recent success of the EfficientNet architecture in image classification and our earlier work on resource-efficient lipreading models (MobiLipNet), we introduce EfficientNets to the lipreading task. Second, we show that the currently most popular in the literature 3D front-end contains a max-pool layer that prohibits networks from reaching superior performance and propose its removal. Finally, we improve our system's back-end robustness by including a Transformer encoder. We evaluate our proposed system on the “Lipreading In-The-Wild” (LRW) corpus, a database containing short video segments from BBC TV broadcasts. The proposed network (T-variant) attains 88.53% word accuracy, a 0.17% absolute improvement over the current state-of-the-art, while being five times less computationally intensive. Further, an up-scaled version of our model (L-variant) achieves 89.52%, a new state-of-the-art result on the LRW corpus. © 2022 IEEE | en |
dc.language.iso | en | en |
dc.source | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings | en |
dc.source.uri | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85134038644&doi=10.1109%2fICASSP43922.2022.9747729&partnerID=40&md5=8e52210a1b9b3f99a047dd008af132de | |
dc.subject | Computer vision | en |
dc.subject | Network layers | en |
dc.subject | Efficientnet | en |
dc.subject | End to end | en |
dc.subject | Front end | en |
dc.subject | Images classification | en |
dc.subject | Lipreading | en |
dc.subject | Performance | en |
dc.subject | Resource-efficient | en |
dc.subject | State of the art | en |
dc.subject | Transformer | en |
dc.subject | Video segments | en |
dc.subject | Network architecture | en |
dc.subject | Institute of Electrical and Electronics Engineers Inc. | en |
dc.title | ACCURATE AND RESOURCE-EFFICIENT LIPREADING WITH EFFICIENTNETV2 AND TRANSFORMERS | en |
dc.type | conferenceItem | en |
Ficheros en el ítem
Ficheros | Tamaño | Formato | Ver |
---|---|---|---|
No hay ficheros asociados a este ítem. |