Listar por tema "European Signal Processing Conference, EUSIPCO"

An Audiovisual Child Emotion Recognition System for Child-Robot Interaction Applications

Filntisis P.P., Efthymiou N., Potamianos G., Maragos P. (2021)

We present an audiovisual emotion recognition system tailored to child-robot interaction scenarios. Our proposed system is based on deep learning and the Temporal Segment Networks framework, receives input from both the ...

Fingerspelled alphabet sign recognition in upper-body videos

Papadimitriou K., Potamianos G. (2019)

Fingerspelling is a crucial part of sign-based communication, however its recognition remains a challenging and mostly overlooked computer vision problem. To address it, this paper presents a system that recognizes the 24 ...

A fully convolutional sequence learning approach for cued speech recognition from videos

Papadimitriou K., Potamianos G. (2021)

Cued Speech constitutes a sign-based communication variant for the speech and hearing impaired, which involves visual information from lip movements combined with hand positional and gestural cues. In this paper, we consider ...

H-V shadow detection based on electromagnetism-like optimization

Koutsiou D.-C.C., Savelonas M., Iakovidis D.K. (2021)

Shadow detection is useful in a variety of image analysis applications, as it can improve scene understanding. Most of the recent shadow detection approaches use near-infrared (NIR) cameras and deep learning to provide ...

Multi-channel non-negative matrix factorization for overlapped acoustic event detection

Giannoulis P., Potamianos G., Maragos P. (2018)

In this paper, we propose two multi-channel extensions of non-negative matrix factorization (NMF) for acoustic event detection. The first method performs decision fusion on the activation matrices produced from independent ...

Overlapped Sound Event Classification via Multi-Channel Sound Separation Network

Giannoulis P., Potamianos G., Maragos P. (2021)

Overlapped sound event classification (SEC) can be a challenging task, especially in scenarios where the number of possible event classes or the number of simultaneous events occurring (polyphony level) are large. In such ...

Resource-efficient TDNN Architectures for Audio-visual Speech Recognition

Koumparoulis A., Potamianos G., Thomas S., da Silva Morais E. (2021)

In this paper, we consider the problem of resource-efficient architectures for audio-visual automatic speech recognition (AVSR). Specifically, we complement our earlier work that introduced efficient convolutional neural ...