Πλοήγηση ανά Θέμα "Speech recognition"
Αποτελέσματα 1-20 από 28
-
The Athena-RC system for speech activity detection and speaker localization in the DIRHA smart home
(2014)We present our system for speech activity detection and speaker localization inside a smart home with multiple rooms equipped with microphone arrays of known geometry and placement. The smart home is developed as part of ... -
ATHENA: A Greek multi-sensory database for home automation control
(2014)In this paper we present a Greek speech database with real multi-modal data in a smart home two-room environment. In total, 20 speakers were recorded in 240 one-minute long sessions. The recordings include utterances of ... -
Audio-visual speech activity detection in a two-speaker scenario incorporating depth information from a profile or frontal view
(2017)Motivated by increasing popularity of depth visual sensors, such as the Kinect device, we investigate the utility of depth information in audio-visual speech activity detection. A two-subject scenario is assumed, allowing ... -
Audio-visual speech recognition incorporating facial depth information captured by the Kinect
(2012)We investigate the use of facial depth data of a speaking subject, captured by the Kinect device, as an additional speechinformative modality to incorporate to a traditional audiovisual automatic speech recognizer. We ... -
Audio-visual speech recognition using depth information from the Kinect in noisy video conditions
(2012)In this paper we build on our recent work, where we successfully incorporated facial depth data of a speaker captured by the Microsoft Kinect device, as a third data stream in an audio-visual automatic speech recognizer. ... -
An Audiovisual Child Emotion Recognition System for Child-Robot Interaction Applications
(2021)We present an audiovisual emotion recognition system tailored to child-robot interaction scenarios. Our proposed system is based on deep learning and the Temporal Segment Networks framework, receives input from both the ... -
ChildBot: Multi-robot perception and interaction with children
(2022)In this paper, we present an integrated robotic system capable of participating in and performing a wide range of educational and entertainment tasks collaborating with one or more children. The system, called ChildBot, ... -
Deep View2View Mapping for View-Invariant Lipreading
(2019)Recently, visual-only and audio-visual speech recognition have made significant progress thanks to deep-learning based, trainable visual front-ends (VFEs), with most research focusing on frontal or near-frontal face videos. ... -
Detecting audio-visual synchrony using deep neural networks
(2015)In this paper, we address the problem of automatically detecting whether the audio and visual speech modalities in frontal pose videos are synchronous or not. This is of interest in a wide range of applications, for example ... -
Discrimination and perception of the acoustic rendition of texts by blind people
(2007)This paper reports on the results from a series of psychoacoustic experiments in the field of the auditory representation of texts via synthetic speech which comprise similar acoustic patterns so called "paronyms". The ... -
Emotion recognition from speech: A classroom experiment
(2018)In this position paper we present an approach for the recognition of emotions from speech. Our goal is to understand the affective state of learners upon a learning process. We propose an approach that uses visual ... -
Experiments on far-field multichannel speech processing in smart homes
(2013)In this paper, we examine three problems that rise in the modern, challenging area of far-field speech processing. The developed methods for each problem, namely (a) multichannel speech enhancement, (b) voice activity ... -
Exploring ROI size in deep learning based lipreading
(2017)Automatic speechreading systems have increasingly exploited deep learning advances, resulting in dramatic gains over traditional methods. State-of-the-art systems typically employ convolutional neural networks (CNNs), ... -
A fully convolutional sequence learning approach for cued speech recognition from videos
(2021)Cued Speech constitutes a sign-based communication variant for the speech and hearing impaired, which involves visual information from lip movements combined with hand positional and gestural cues. In this paper, we consider ... -
Multi-microphone fusion for detection of speech and acoustic events in smart spaces
(2014)In this paper, we examine the challenging problem of detecting acoustic events and voice activity in smart indoors environments, equipped with multiple microphones. In particular, we focus on channel combination strategies, ... -
Multi-room speech activity detection using a distributed microphone network in domestic environments
(2015)Domestic environments are particularly challenging for distant speech recognition: reverberation, background noise and interfering sources, as well as the propagation of acoustic events across adjacent rooms, critically ... -
Multi3: Multi-sensory perception system for multi-modal child interaction with multiple robots
(2018)Child-robot interaction is an interdisciplinary research area that has been attracting growing interest, primarily focusing on edutainment applications. A crucial factor to the successful deployment and wide adoption of ... -
Multimodal fusion and sequence learning for cued speech recognition from videos
(2021)Cued Speech (CS) constitutes a non-vocal mode of communication that relies on lip movements in conjunction with hand positional and gestural cues, in order to disambiguate phonetic information and make it accessible to the ... -
A non-linguistic approach for human emotion recognition from speech
(2019)One of the most important issues in several aspects of human-computer interaction is the understanding of the users' emotional state. In several applications such as monitoring of humans in assistive living environments, ... -
Precision‐Based Weighted Blending Distributed Ensemble Model for Emotion Classification
(2022)Focusing on emotion recognition, this paper addresses the task of emotion classification and its performance with respect to accuracy, by investigating the capabilities of a distributed ensemble model using precision‐based ...