Abstract
In recent years, voice activity detection has been a highly researched field, due to its importance as input stage in many real-world applications. Automated detection of vocalisations in the very first year of life is still a stepchild of this field. On our quest defining acoustic parameters in pre-linguistic vocalisations as markers for neuro(mal)development, we are confronted with the challenge of manually segmenting and annotating hours of variable quality home video material for sequences of infant voice/vocalisations. While in total our corpus comprises video footage of typically developing infants and infants with various neurodevelopmental disorders of more than a year running time, only a small proportion has been processed so far. This calls for automated assistance tools for detecting and/or segmenting infant utterances from real-live video recordings. In this paper, we investigated several approaches of infant voice detection and segmentation, including a rule-based voice activity detector, hidden Markov models with Gaussian mixture observation models, support vector machines, and random forests. Results indicate that the applied methods could be well applied in a semi-automated retrieval of infant utterances from highly non-standardised footage. At the same time, our results show that, a fully automated approach for this problem is yet to come.
| Original language | English |
|---|---|
| Title of host publication | Interspeech 2016 8-12 Sep 2016, San Francisco |
| Editors | Nelson Morgan |
| Publisher | ISCA |
| Pages | 2997-3001 |
| Number of pages | 5 |
| DOIs | |
| Publication status | Published - 1 Jan 2016 |
| Externally published | Yes |
| Event | 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 - San Francisco, United States Duration: 8 Sept 2016 → 12 Sept 2016 |
Conference
| Conference | 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 |
|---|---|
| Country/Territory | United States |
| City | San Francisco |
| Period | 8/09/16 → 12/09/16 |
Funding
The authors acknowledge funding from the Austrian Science Fund (FWF; P25241), the National Bank of Austria (OeNB; P16430), BioTechMed-Graz, the General Movements Trust, and the EU's H2020 Programme via RIA #688835 (DEENIGMA). Special thanks go to Andreas Kimmerle, Jorge Luis Moye, Sergio Roccabado, Iris Tomantschger, Adriana Villarroel, Diego Villarroel, and Claudia Zitta for their assistance in the vocalisation segmentation process. Moreover, thanks to Gunter Vogrinec for contributing to the dataset description. The authors express their gratitude to all parents who provided us with home video material for scientific analysis. 3000
Keywords
- Home video database
- Infant vocalisation
- Retrospective audio-video analysis
- Voice activity detection
Fingerprint
Dive into the research topics of 'Manual versus automated: The challenging routine of infant vocalisation segmentation in home videos to study neuro(mal)development'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver