Show simple item record

dc.creatorPapadimitriou K., Potamianos G.en
dc.date.accessioned2023-01-31T09:42:21Z
dc.date.available2023-01-31T09:42:21Z
dc.date.issued2021
dc.identifier10.23919/Eusipco47968.2020.9287365
dc.identifier.isbn9789082797053
dc.identifier.issn22195491
dc.identifier.urihttp://hdl.handle.net/11615/77585
dc.description.abstractCued Speech constitutes a sign-based communication variant for the speech and hearing impaired, which involves visual information from lip movements combined with hand positional and gestural cues. In this paper, we consider its automatic recognition in videos, introducing a deep sequence learning approach that consists of two separately trained components: an image learner based on convolutional neural networks (CNNs) and a fully convolutional encoder-decoder. Specifically, handshape and lip visual features extracted from a 3D-CNN feature learner, as well as hand position embeddings obtained by a 2D-CNN, are concatenated and fed to a time-depth separable (TDS) block structure, followed by a multi-step attention-based convolutional decoder for phoneme prediction. To our knowledge, this is the first work where recognition of cued speech is addressed using a common modeling approach based entirely on CNNs. The introduced model is evaluated on a French and a British English cued speech dataset in terms of phoneme error rate, and it is shown to significantly outperform alternative modeling approaches. © 2021 European Signal Processing Conference, EUSIPCO. All rights reserved.en
dc.language.isoenen
dc.sourceEuropean Signal Processing Conferenceen
dc.source.urihttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85099302646&doi=10.23919%2fEusipco47968.2020.9287365&partnerID=40&md5=b7ff5d54979070097dcafcff0e903a24
dc.subjectAuditionen
dc.subjectConvolutionen
dc.subjectConvolutional neural networksen
dc.subjectDecodingen
dc.subjectDeep learningen
dc.subjectSignal processingen
dc.subjectSpeechen
dc.subjectSpeech communicationen
dc.subjectVisual communicationen
dc.subjectAutomatic recognitionen
dc.subjectBlock structuresen
dc.subjectBritish Englishen
dc.subjectConvolutional decodersen
dc.subjectConvolutional encodersen
dc.subjectHearing impaireden
dc.subjectSequence learningen
dc.subjectVisual informationen
dc.subjectSpeech recognitionen
dc.subjectEuropean Signal Processing Conference, EUSIPCOen
dc.titleA fully convolutional sequence learning approach for cued speech recognition from videosen
dc.typeconferenceItemen


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record