Zur Kurzanzeige

dc.creatorThermos S., Daras P., Potamianos G.en
dc.date.accessioned2023-01-31T10:08:11Z
dc.date.available2023-01-31T10:08:11Z
dc.date.issued2020
dc.identifier10.1109/ICASSP40776.2020.9054167
dc.identifier.isbn9781509066315
dc.identifier.issn15206149
dc.identifier.urihttp://hdl.handle.net/11615/79695
dc.description.abstractLearning to understand and infer object functionalities is an important step towards robust visual intelligence. Significant research efforts have recently focused on segmenting the object parts that enable specific types of human-object interaction, the so-called object affordances. However, most works treat it as a static semantic segmentation problem, focusing solely on object appearance and relying on strong supervision and object detection. In this paper, we propose a novel approach that exploits the spatio-temporal nature of human-object interaction for affordance segmentation. In particular, we design an autoencoder that is trained using ground-truth labels of only the last frame of the sequence, and is able to infer pixel-wise affordance labels in both videos and static images. Our model surpasses the need for object labels and bounding boxes by using a soft-attention mechanism that enables the implicit localization of the interaction hotspot. For evaluation purposes, we introduce the SOR3D-AFF corpus, which consists of human-object interaction sequences and supports 9 types of affordances in terms of pixel-wise annotation, covering typical manipulations of tool-like objects. We show that our model achieves competitive results compared to strongly supervised methods on SOR3D-AFF, while being able to predict affordances for similar unseen objects in two affordance image-only datasets. © 2020 IEEE.en
dc.language.isoenen
dc.sourceICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedingsen
dc.source.urihttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85089238483&doi=10.1109%2fICASSP40776.2020.9054167&partnerID=40&md5=8daf6328e04fe130a2ed5a45b05118e7
dc.subjectAudio signal processingen
dc.subjectImage segmentationen
dc.subjectObject detectionen
dc.subjectPixelsen
dc.subjectSemanticsen
dc.subjectSpeech communicationen
dc.subjectAttention mechanismsen
dc.subjectHuman-object interactionen
dc.subjectLearning approachen
dc.subjectObject appearanceen
dc.subjectResearch effortsen
dc.subjectStatic semanticsen
dc.subjectSupervised methodsen
dc.subjectVisual intelligenceen
dc.subjectDeep learningen
dc.subjectInstitute of Electrical and Electronics Engineers Inc.en
dc.titleA Deep Learning Approach to Object Affordance Segmentationen
dc.typeconferenceItemen


Dateien zu dieser Ressource

DateienGrößeFormatAnzeige

Zu diesem Dokument gibt es keine Dateien.

Das Dokument erscheint in:

Zur Kurzanzeige