Abstract
In speech communication systems, such as voice-controlled systems, hands-free mobile
telephones, and hearing aids, the received microphone signals are degraded by room
reverberation, background noise, and other interferences. This signal degradation may
lead to total unintelligibility of the speech and decreases the performance of automatic
speech recognition systems.
In the context of this work reverberation is the process of multi-path propagation of an
acoustic sound from its source to one or more microphones. The received microphone
signal generally consists of a direct sound, reflections that arrive shortly after the
direct sound (commonly called early reverberation), and reflections that arrive after
the early reverberation (commonly called late reverberation). Reverberant speech
can be described as sounding distant with noticeable echo and colouration. These
detrimental perceptual effects are primarily caused by late reverberation, and generally
increase with increasing distance between the source and microphone. Conversely,
early reverberations tend to improve the intelligibility of speech. In combination with
the direct sound it is sometimes referred to as the early speech component.
Reduction of the detrimental effects of reflections is evidently of considerable practical
importance, and is the focus of this dissertation. More specifically the dissertation
deals with dereverberation techniques, i.e., signal processing techniques to reduce
the detrimental effects of reflections. In the dissertation, novel single- and multimicrophone
speech dereverberation algorithms are developed that aim at the suppression
of late reverberation, i.e., at estimation of the early speech component. This is
done via so-called spectral enhancement techniques that require a specific measure of
the late reverberant signal. This measure, called spectral variance, can be estimated
directly from the received (possibly noisy) reverberant signal(s) using a statistical reverberation
model and a limited amount of a priori knowledge about the acoustic
channel(s) between the source and the microphone(s).
In our work an existing single-channel statistical reverberation model serves as a starting
point. The model is characterized by one parameter that depends on the acoustic
characteristics of the environment. We show that the spectral variance estimator that
is based on this model, can only be used when the source-microphone distance is larger
than the so-called critical distance. This is, crudely speaking, the distance where the
direct sound power is equal to the total reflective power. A generalization of the statistical
reverberation model in which the direct sound is incorporated is developed. This
model requires one additional parameter that is related to the ratio between the direct
sound energy and the sound energy of all reflections. The generalized model is used to
derive a novel spectral variance estimator. When the novel estimator is used for dereverberation
rather than the existing estimator, and the source-microphone distance
is smaller than the critical distance, the dereverberation performance is significantly
increased.
Single-microphone systems only exploit the temporal and spectral diversity of the received
signal. Reverberation, of course, also induces spatial diversity. To additionally
exploit this diversity, multiple microphones must be used, and their outputs must be
combined by a suitable spatial processor such as the so-called delay and sum beamformer.
It is not a priori evident whether spectral enhancement is best done before
or after the spatial processor. For this reason we investigate both possibilities, as
well as a merge of the spatial processor and the spectral enhancement technique. An
advantage of the latter option is that the spectral variance estimator can be further
improved. Our experiments show that the use of multiple microphones affords a significant
improvement of the perceptual speech quality.
The applicability of the theory developed in this dissertation is demonstrated using a
hands-free communication system. Since hands-free systems are often used in a noisy
and reverberant environment, the received microphone signal does not only contain the
desired signal but also interferences such as room reverberation that is caused by the
desired source, background noise, and a far-end echo signal that results from a sound
that is produced by the loudspeaker. Usually an acoustic echo canceller is used to
cancel the far-end echo. Additionally a post-processor is used to suppress background
noise and residual echo, i.e., echo which could not be cancelled by the echo canceller.
In this work a novel structure and post-processor for an acoustic echo canceller are
developed. The post-processor suppresses late reverberation caused by the desired
source, residual echo, and background noise. The late reverberation and late residual
echo are estimated using the generalized statistical reverberation model. Experimental
results convincingly demonstrate the benefits of the proposed system for suppressing
late reverberation, residual echo and background noise. The proposed structure and
post-processor have a low computational complexity, a highly modular structure, can
be seamlessly integrated into existing hands-free communication systems, and affords
a significant increase of the listening comfort and speech intelligibility.
Original language | English |
---|---|
Qualification | Doctor of Philosophy |
Awarding Institution |
|
Supervisors/Advisors |
|
Award date | 25 Jun 2007 |
Place of Publication | Eindhoven |
Publisher | |
Print ISBNs | 978-90-386-1544-8 |
DOIs | |
Publication status | Published - 2007 |