No-audio multimodal speech detection in crowded social setings task at MediaEval 2018

  • Laura Cabrera-Quiros
  • , Ekin Gedik
  • , Hayley Hung

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

1 Citation (Scopus)
36 Downloads (Pure)

Abstract

This overview paper provides a description of the automatic Human Behaviour Analysis (HBA) task for the MediaEval 2018. In its first edition, the HBA task focuses on analyzing one of the most basic elements of social behavior: the estimation of speaking status. Task participants are provided with cropped videos of individuals while interacting freely during a crowded mingle event that was captured by an overhead camera. Each individual is also wearing a badge-like device hung around the neck recording tri-axial acceleration. The goal of this task is to automatically estimate if a person is speaking or not using these two alternative modalities. In contrast to conventional speech detection approaches, no audio is used for this task. Instead, the automatic estimation system must exploit the natural human movements that accompany speech. The task seeks to achieve competitive estimation performance compared to audio-based systems by exploiting the multi-modal aspects of the problem. Copyright held by the owner/author(s).

Original languageEnglish
Title of host publicationMediaEval 2018 Multimedia Benchmark Workshop
Subtitle of host publicationWorking Notes Proceedings of the MediaEval 2018 Workshop Sophia Antipolis, France, 29-31 October 2018
EditorsMartha Larson, Piyush Arora, Claire-Hélène Demarty
PublisherCEUR-WS.org
Number of pages3
Publication statusPublished - 1 Jan 2018
Externally publishedYes
Event2018 MediaEval Workshop - Sophia Antipolis, France
Duration: 29 Oct 201831 Oct 2018

Publication series

NameCEUR Workshop Proceedings
Volume2283
ISSN (Print)1613-0073

Conference

Conference2018 MediaEval Workshop
Country/TerritoryFrance
CitySophia Antipolis
Period29/10/1831/10/18

Funding

This task is partially supported by the Instituto Tecnológico de Costa Rica and the Netherlands Organization for Scientific Research (NWO) under project number 639.022.606.

Fingerprint

Dive into the research topics of 'No-audio multimodal speech detection in crowded social setings task at MediaEval 2018'. Together they form a unique fingerprint.

Cite this