A fully convolutional sequence learning approach for cued speech recognition from videos

Papadimitriou K., Potamianos G.

dc.creator	Papadimitriou K., Potamianos G.	en
dc.date.accessioned	2023-01-31T09:42:21Z
dc.date.available	2023-01-31T09:42:21Z
dc.date.issued	2021
dc.identifier	10.23919/Eusipco47968.2020.9287365
dc.identifier.isbn	9789082797053
dc.identifier.issn	22195491
dc.identifier.uri	http://hdl.handle.net/11615/77585
dc.description.abstract	Cued Speech constitutes a sign-based communication variant for the speech and hearing impaired, which involves visual information from lip movements combined with hand positional and gestural cues. In this paper, we consider its automatic recognition in videos, introducing a deep sequence learning approach that consists of two separately trained components: an image learner based on convolutional neural networks (CNNs) and a fully convolutional encoder-decoder. Specifically, handshape and lip visual features extracted from a 3D-CNN feature learner, as well as hand position embeddings obtained by a 2D-CNN, are concatenated and fed to a time-depth separable (TDS) block structure, followed by a multi-step attention-based convolutional decoder for phoneme prediction. To our knowledge, this is the first work where recognition of cued speech is addressed using a common modeling approach based entirely on CNNs. The introduced model is evaluated on a French and a British English cued speech dataset in terms of phoneme error rate, and it is shown to significantly outperform alternative modeling approaches. © 2021 European Signal Processing Conference, EUSIPCO. All rights reserved.	en
dc.language.iso	en	en
dc.source	European Signal Processing Conference	en
dc.source.uri	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85099302646&doi=10.23919%2fEusipco47968.2020.9287365&partnerID=40&md5=b7ff5d54979070097dcafcff0e903a24
dc.subject	Audition	en
dc.subject	Convolution	en
dc.subject	Convolutional neural networks	en
dc.subject	Decoding	en
dc.subject	Deep learning	en
dc.subject	Signal processing	en
dc.subject	Speech	en
dc.subject	Speech communication	en
dc.subject	Visual communication	en
dc.subject	Automatic recognition	en
dc.subject	Block structures	en
dc.subject	British English	en
dc.subject	Convolutional decoders	en
dc.subject	Convolutional encoders	en
dc.subject	Hearing impaired	en
dc.subject	Sequence learning	en
dc.subject	Visual information	en
dc.subject	Speech recognition	en
dc.subject	European Signal Processing Conference, EUSIPCO	en
dc.title	A fully convolutional sequence learning approach for cued speech recognition from videos	en
dc.type	conferenceItem	en

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19705]

Show simple item record