Logo
    • English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • Ελληνικά 
    • English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • Σύνδεση
Προβολή τεκμηρίου 
  •   Ιδρυματικό Αποθετήριο Πανεπιστημίου Θεσσαλίας
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • Προβολή τεκμηρίου
  •   Ιδρυματικό Αποθετήριο Πανεπιστημίου Θεσσαλίας
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • Προβολή τεκμηρίου
JavaScript is disabled for your browser. Some features of this site may not work without it.
Ιδρυματικό Αποθετήριο Πανεπιστημίου Θεσσαλίας
Όλο το DSpace
  • Κοινότητες & Συλλογές
  • Ανά ημερομηνία δημοσίευσης
  • Συγγραφείς
  • Τίτλοι
  • Λέξεις κλειδιά

SPATIO-TEMPORAL GRAPH CONVOLUTIONAL NETWORKS FOR CONTINUOUS SIGN LANGUAGE RECOGNITION

Thumbnail
Συγγραφέας
Parelli M., Papadimitriou K., Potamianos G., Pavlakos G., Maragos P.
Ημερομηνία
2022
Γλώσσα
en
DOI
10.1109/ICASSP43922.2022.9746971
Λέξη-κλειδί
Computer vision
Convolution
Convolutional neural networks
Deep learning
BiLSTM
Continuous sign language recognition
Convolutional networks
CTC
Expose
Learning frameworks
Pose information
Sign Language recognition
Spatio-temporal graph convolutional network
Spatio-temporal graphs
Blending
Institute of Electrical and Electronics Engineers Inc.
Εμφάνιση Μεταδεδομένων
Επιτομή
We address the challenging problem of continuous sign language recognition (CSLR) from RGB videos, proposing a novel deep-learning framework that employs spatio-temporal graph convolutional networks (ST-GCNs), which operate on multiple, appropriately fused feature streams, capturing the signer's pose, shape, appearance, and motion information. In addition to introducing such networks to the continuous recognition problem, our model's novelty lies on: (i) the feature streams considered and their blending into three ST-GCN modules; (ii) the combination of such modules with bi-directional long short-term memory networks, thus capturing both short-term embedded signing dynamics and long-range feature dependencies; and (iii) the fusion scheme, where the resulting modules operate in parallel, their posteriors aligned via a guiding connectionist temporal classification method, and fused for sign gloss prediction. Notably, concerning (i), in addition to traditional CSLR features, we investigate the utility of 3D human pose and shape parameterization via the “ExPose” approach, as well as 3D skeletal joint information that is regressed from detected 2D joints. We evaluate the proposed system on two well-known CSLR benchmarks, conducting extensive ablations on its modules. We achieve the new state-of-the-art on one of the two datasets, while reaching very competitive performance on the other. © 2022 IEEE
URI
http://hdl.handle.net/11615/77936
Collections
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19735]

Related items

Showing items related by title, author, creator and subject.

  • Thumbnail

    Multimodal fusion and sequence learning for cued speech recognition from videos 

    Papadimitriou K., Parelli M., Sapountzaki G., Pavlakos G., Maragos P., Potamianos G. (2021)
    Cued Speech (CS) constitutes a non-vocal mode of communication that relies on lip movements in conjunction with hand positional and gestural cues, in order to disambiguate phonetic information and make it accessible to the ...
  • Thumbnail

    A fully convolutional sequence learning approach for cued speech recognition from videos 

    Papadimitriou K., Potamianos G. (2021)
    Cued Speech constitutes a sign-based communication variant for the speech and hearing impaired, which involves visual information from lip movements combined with hand positional and gestural cues. In this paper, we consider ...
  • Thumbnail

    Look-behind fully convolutional neural network for computer-aided endoscopy 

    Diamantis D.E., Iakovidis D.K., Koulaouzidis A. (2019)
    In this paper, we propose a novel Fully Convolutional Neural Network (FCN) architecture aiming to aid the detection of abnormalities, such as polyps, ulcers and blood, in gastrointestinal (GI) endoscopy images. The proposed ...
htmlmap 

 

Πλοήγηση

Όλο το DSpaceΚοινότητες & ΣυλλογέςΑνά ημερομηνία δημοσίευσηςΣυγγραφείςΤίτλοιΛέξεις κλειδιάΑυτή η συλλογήΑνά ημερομηνία δημοσίευσηςΣυγγραφείςΤίτλοιΛέξεις κλειδιά

Ο λογαριασμός μου

ΣύνδεσηΕγγραφή (MyDSpace)
Πληροφορίες-Επικοινωνία
ΑπόθεσηΣχετικά μεΒοήθειαΕπικοινωνήστε μαζί μας
Επιλογή ΓλώσσαςΌλο το DSpace
EnglishΕλληνικά
htmlmap