Abstract
Whereas the human auditory system has remarkable capabilities to focus on a particular target source in complex multi-source scenarios, it has remained a challenging task to develop algorithms that are able to retrieve information about sound sources in a complex acoustic scene (e.g. to localize and identify active speech sources). A robust binaural scene recognizer will be presented that is able to simultaneously localize and classify a predefined number of target speech sources in the presence of reverberation and interfering noise. The model consists of three stages: localization stage, detection of speech sources, and recognition of speaker identities. First, a binaural front-end is used to localize relevant sound source activity. Based on this localization information, a binary mask is determined which identifies the activity of individual sound sources on a time-frequency (T-F) basis. The localization is based on the supervised learning of azimuth-dependent binaural features, namely interaural time and level differences (ITDs and ILDs). Secondly, a speech detection module determines whether the corresponding source type is speech or noise for all sound sources that have been found. For this purpose the estimated binary mask and the corresponding spectral features are passed to a missing data classifier for each sound source candidate. Finally, the speaker identity of all detected speech sources is recognized. The proposed system is analyzed in simulated, adverse conditions including interfering noise, reverberation and the presence of multiple target sources. Compared to a state-of-the art MFCC recognizer, the proposed model achieves significant speaker recognition accuracy improvements.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of Forum Acusticum 2011, 27 June-01 July, Aalborg, Denmark |
| Publisher | European Acoustics Association, EAA |
| Pages | 2121-2126 |
| Number of pages | 6 |
| ISBN (Print) | 978-84-694-1520-7 |
| Publication status | Published - 1 Dec 2011 |
| Event | 6th Forum Acusticum 2011 (FA 2011), 27 June - 01 July 2011, Aalborg, Denmark - Aalborg, Denmark Duration: 27 Jun 2011 → 1 Jul 2011 |
Conference
| Conference | 6th Forum Acusticum 2011 (FA 2011), 27 June - 01 July 2011, Aalborg, Denmark |
|---|---|
| Country/Territory | Denmark |
| City | Aalborg |
| Period | 27/06/11 → 1/07/11 |
| Other | Forum Acusticum |
Fingerprint
Dive into the research topics of 'Simultaneous localization and identification of speakers in noisy and reverberant environments'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver