Logo
    • English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • English 
    • English
    • Ελληνικά
    • Deutsch
    • français
    • italiano
    • español
  • Login
View Item 
  •   University of Thessaly Institutional Repository
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • View Item
  •   University of Thessaly Institutional Repository
  • Επιστημονικές Δημοσιεύσεις Μελών ΠΘ (ΕΔΠΘ)
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ.
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.
Institutional repository
All of DSpace
  • Communities & Collections
  • By Issue Date
  • Authors
  • Titles
  • Subjects

Exploiting 3D Hand Pose Estimation in Deep Learning-Based Sign Language Recognition from RGB Videos

Thumbnail
Author
Parelli M., Papadimitriou K., Potamianos G., Pavlakos G., Maragos P.
Date
2020
Language
en
DOI
10.1007/978-3-030-66096-3_18
Keyword
Computer hardware description languages
Computer vision
Convolutional neural networks
Data streams
Palmprint recognition
3D hand pose estimations
3D human pose estimation
Depth information
Encoder-decoder
Multiple streams
Recognition systems
Sign Language recognition
State of the art
Deep learning
Springer Science and Business Media Deutschland GmbH
Metadata display
Abstract
In this paper, we investigate the benefit of 3D hand skeletal information to the task of sign language (SL) recognition from RGB videos, within a state-of-the-art, multiple-stream, deep-learning recognition system. As most SL datasets are available in traditional RGB-only video lacking depth information, we propose to infer 3D coordinates of the hand joints from RGB data via a powerful architecture that has been primarily introduced in the literature for the task of 3D human pose estimation. We then fuse these estimates with additional SL informative streams, namely 2D skeletal data, as well as convolutional neural network-based hand- and mouth-region representations, and employ an attention-based encoder-decoder for recognition. We evaluate our proposed approach on a corpus of isolated signs of Greek SL and a dataset of continuous finger-spelling in American SL, reporting significant gains by the inclusion of 3D hand pose information, while also outperforming the state-of-the-art on both databases. Further, we evaluate the 3D hand pose estimation technique as standalone. © 2020, Springer Nature Switzerland AG.
URI
http://hdl.handle.net/11615/77937
Collections
  • Δημοσιεύσεις σε περιοδικά, συνέδρια, κεφάλαια βιβλίων κλπ. [19735]
htmlmap 

 

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

LoginRegister (MyDspace)
Help Contact
DepositionAboutHelpContact Us
Choose LanguageAll of DSpace
EnglishΕλληνικά
htmlmap