Representation Learning for Single-Channel Source Separation and Bandwidth Extension

Mathias Zoehrer, Robert Peharz, Franz Pernkopf

Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

15 Citaten (Scopus)

Samenvatting

In this paper, we use deep representation learning for model-based single-channel source separation (SCSS) and artificial bandwidth extension (ABE). Both tasks are ill-posed and source-specific prior knowledge is required. In addition to well-known generative models such as restricted Boltzmann machines and higher order contractive autoencoders two recently introduced deep models, namely generative stochastic networks (GSNs) and sum-product networks (SPNs), are used for learning spectrogram representations. For SCSS we evaluate the deep architectures on data of the 2 nd CHiME speech separation challenge and provide results for a speaker dependent, a speaker independent, a matched noise condition and an unmatched noise condition task. GSNs obtain the best PESQ and overall perceptual score on average in all four tasks. Similarly, frame-wise GSNs are able to reconstruct the missing frequency bands in ABE best, measured in frequency-domain segmental SNR. They outperform SPNs embedded in hidden Markov models and the other representation models significantly.
Originele taal-2Engels
Pagina's (van-tot)2398 - 2409
TijdschriftIEEE Transactions on Audio, Speech, and Language Processing
Volume23
Nummer van het tijdschrift12
DOI's
StatusGepubliceerd - 2015
Extern gepubliceerdJa

Vingerafdruk

Duik in de onderzoeksthema's van 'Representation Learning for Single-Channel Source Separation and Bandwidth Extension'. Samen vormen ze een unieke vingerafdruk.

Citeer dit