On efficacy of Meta-Learning for Domain Generalization in Speech Emotion Recognition

Raeshak King Gandhi, Vasilis Tsouvalas, Nirvana Meratnia

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

105 Downloads (Pure)

Abstract

Speech Emotion Recognition (SER) refers to the recognition of human emotions from natural speech, vital for building human-centered context-aware intelligent systems. Here, domain shift, where models' trained on one domain exhibit performance degradation when exposed to an unseen domain with different statistics, is a major limiting factor in SER applicability, as models have a strong dependence on speakers and languages characteristics used during training. Meta-Learning for Domain Generalization (MLDG) has shown great success in improving models' generalization capacity and alleviate the domain shift problem in the vision domain; yet, its' efficacy on SER remains largely unexplored. In this work, we propose a "domain-shift aware" MLDG approach to learn generalizable models across multiple domains in SER. Based on our extensive evaluation, we identify a number of pitfalls that contribute to poor models' DG ability, and demonstrate that log-mel spectrograms representations lack distinct features required for MLDG in SER. We further explore the use of appropriate features to achieve DG in SER as to provide insides to future research directions for DG in SER.
Original languageEnglish
Title of host publication2023 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events, PerCom Workshops 2023
PublisherInstitute of Electrical and Electronics Engineers
Pages421-426
Number of pages6
ISBN (Electronic)978-1-6654-5381-3
DOIs
Publication statusPublished - 21 Jun 2023
Event2023 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events, PerCom Workshops 2023 - Atlanta, United States
Duration: 13 Mar 202317 Mar 2023

Conference

Conference2023 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events, PerCom Workshops 2023
Country/TerritoryUnited States
CityAtlanta
Period13/03/2317/03/23

Funding

ACKNOWLEDGMENT This work is partially performed in the context of the Distributed Artificial Intelligent Systems project supported by the ECSEL Joint Undertaking.

Keywords

  • Deep learning
  • Domain shift
  • Domain generalization
  • speech emotion recognition
  • meta-learning

Fingerprint

Dive into the research topics of 'On efficacy of Meta-Learning for Domain Generalization in Speech Emotion Recognition'. Together they form a unique fingerprint.

Cite this