On the performance of speech output in a practical setting

E.A.M. Klabbers, R.P.G. Collier

    Research output: Contribution to journalArticleAcademicpeer-review

    Abstract

    In spoken dialogue systems, in which humans interact with computers over the telephone, it is essential that the voice output of the system be of high quality. Both the intelligibility and the naturalness of the output should be sufficiently high. There are several techniques for providing a system with speech output, each with its own advantages and disadvantages. This paper discusses a formal evaluation experiment of three speech output techniques. Natural speech was included as a reference condition. The speech was rated on intelligibility and fluency of the output. Additionally, the overall quality of the speech and its suitability for use in a commercial application were assessed. The results reveal significant differences between the techniques. Diphone synthesis still has an inferior quality compared to the other techniques, both in terms of intelligibility and fluency. Conventional phrase concatenation is quite intelligible, but scores less on fluency. IPO's phrase concatenation is by far the best technique.
    Original languageEnglish
    Pages (from-to)121-128
    Number of pages8
    JournalIPO Annual Progress Report
    Volume33
    Publication statusPublished - 1998

    Fingerprint Dive into the research topics of 'On the performance of speech output in a practical setting'. Together they form a unique fingerprint.

    Cite this