A fully convolutional sequence learning approach for cued speech recognition from videos

Cued Speech constitutes a sign-based communication variant for the speech and hearing impaired, which involves visual information from lip movements combined with hand positional and gestural cues. In this paper, we consider its automatic recognition in videos, introducing a deep sequence learning approach that consists of two separately trained components: an image learner based on convolutional neural networks (CNNs) and a fully convolutional encoder-decoder. Specifically, handshape and lip visual features extracted from a 3D-CNN feature learner, as well as hand position embeddings obtained by a 2D-CNN, are concatenated and fed to a time-depth separable (TDS) block structure, followed by a multi-step attention-based convolutional decoder for phoneme prediction. To our knowledge, this is the first work where recognition of cued speech is addressed using a common modeling approach based entirely on CNNs. The introduced model is evaluated on a French and a British English cued speech dataset in terms of phoneme error rate, and it is shown to significantly outperform alternative modeling approaches. © 2021 European Signal Processing Conference, EUSIPCO. All rights reserved.

URI

http://hdl.handle.net/11615/77585

Collections

Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19735]

A fully convolutional sequence learning approach for cued speech recognition from videos

Συγγραφέας

Ημερομηνία

Γλώσσα

DOI

Λέξη-κλειδί

Επιτομή

URI

Collections

Related items

Multimodal fusion and sequence learning for cued speech recognition from videos ﻿

SPATIO-TEMPORAL GRAPH CONVOLUTIONAL NETWORKS FOR CONTINUOUS SIGN LANGUAGE RECOGNITION ﻿

Look-behind fully convolutional neural network for computer-aided endoscopy ﻿

Multimodal fusion and sequence learning for cued speech recognition from videos

SPATIO-TEMPORAL GRAPH CONVOLUTIONAL NETWORKS FOR CONTINUOUS SIGN LANGUAGE RECOGNITION

Look-behind fully convolutional neural network for computer-aided endoscopy