This paper demonstrates the potential of theoretically motivated learning methods in solving the problem of non-intrusive quality estimation for which the state-of-the-art is represented by ITU-T P.563 standard. To construct our estimator, we adopt the speech features from P.563, while we use a different mapping of features to form quality estimates. In contrast to P.563 which assumes distortion-classes to divide the feature space, our approach divides the feature space based on a clustering which is learned from the data using Bayesian inference. Despite using weaker modeling assumptions, we are still able to achieve comparable accuracy on predicting mean-opinion-scores with P.563. Our work suggests Bayesian model-evidence as an alternative metric to correlation-coefficient for determining the necessary number of experts for modeling the data.
|Title of host publication||Second International Workshop on Quality of Multimedia Experience, IEEE Signal Processing Society, 2010, 21-23 June, TrondheiM|
|Publication status||Published - 2010|
Mossavat, S. I., Amft, O. D., Vries, de, B., Petkov, P. N., & Kleijn, W. B. (2010). A Bayesian hierarchical mixture of experts approach to estimate speech quality. In Second International Workshop on Quality of Multimedia Experience, IEEE Signal Processing Society, 2010, 21-23 June, TrondheiM (pp. 200-205) https://doi.org/10.1109/QOMEX.2010.5516203