Abstract
Speech Emotion Recognition (SER) refers to the recognition of human emotions from natural speech, vital for building human-centered context-aware intelligent systems. Here, domain shift, where models' trained on one domain exhibit performance degradation when exposed to an unseen domain with different statistics, is a major limiting factor in SER applicability, as models have a strong dependence on speakers and languages characteristics used during training. Meta-Learning for Domain Generalization (MLDG) has shown great success in improving models' generalization capacity and alleviate the domain shift problem in the vision domain; yet, its' efficacy on SER remains largely unexplored. In this work, we propose a "domain-shift aware" MLDG approach to learn generalizable models across multiple domains in SER. Based on our extensive evaluation, we identify a number of pitfalls that contribute to poor models' DG ability, and demonstrate that log-mel spectrograms representations lack distinct features required for MLDG in SER. We further explore the use of appropriate features to achieve DG in SER as to provide insides to future research directions for DG in SER.
Original language | English |
---|---|
Title of host publication | 2023 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events, PerCom Workshops 2023 |
Publisher | Institute of Electrical and Electronics Engineers |
Pages | 421-426 |
Number of pages | 6 |
ISBN (Electronic) | 978-1-6654-5381-3 |
DOIs | |
Publication status | Published - 21 Jun 2023 |
Event | 2023 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events, PerCom Workshops 2023 - Atlanta, United States Duration: 13 Mar 2023 → 17 Mar 2023 |
Conference
Conference | 2023 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events, PerCom Workshops 2023 |
---|---|
Country/Territory | United States |
City | Atlanta |
Period | 13/03/23 → 17/03/23 |
Funding
ACKNOWLEDGMENT This work is partially performed in the context of the Distributed Artificial Intelligent Systems project supported by the ECSEL Joint Undertaking.
Keywords
- Deep learning
- Domain shift
- Domain generalization
- speech emotion recognition
- meta-learning