On the performance of speech output in a practical setting

E.A.M. Klabbers, R.P.G. Collier

    Research output: Contribution to journalArticleAcademicpeer-review

    Abstract

    In spoken dialogue systems, in which humans interact with computers over the telephone, it is essential that the voice output of the system be of high quality. Both the intelligibility and the naturalness of the output should be sufficiently high. There are several techniques for providing a system with speech output, each with its own advantages and disadvantages. This paper discusses a formal evaluation experiment of three speech output techniques. Natural speech was included as a reference condition. The speech was rated on intelligibility and fluency of the output. Additionally, the overall quality of the speech and its suitability for use in a commercial application were assessed. The results reveal significant differences between the techniques. Diphone synthesis still has an inferior quality compared to the other techniques, both in terms of intelligibility and fluency. Conventional phrase concatenation is quite intelligible, but scores less on fluency. IPO's phrase concatenation is by far the best technique.
    Original languageEnglish
    Pages (from-to)121-128
    Number of pages8
    JournalIPO Annual Progress Report
    Volume33
    Publication statusPublished - 1998

    Fingerprint

    Speech intelligibility
    Telephone
    Experiments

    Cite this

    Klabbers, E. A. M., & Collier, R. P. G. (1998). On the performance of speech output in a practical setting. IPO Annual Progress Report, 33, 121-128.
    Klabbers, E.A.M. ; Collier, R.P.G. / On the performance of speech output in a practical setting. In: IPO Annual Progress Report. 1998 ; Vol. 33. pp. 121-128.
    @article{6ead3bd11f9b4365bacb58e920a5590f,
    title = "On the performance of speech output in a practical setting",
    abstract = "In spoken dialogue systems, in which humans interact with computers over the telephone, it is essential that the voice output of the system be of high quality. Both the intelligibility and the naturalness of the output should be sufficiently high. There are several techniques for providing a system with speech output, each with its own advantages and disadvantages. This paper discusses a formal evaluation experiment of three speech output techniques. Natural speech was included as a reference condition. The speech was rated on intelligibility and fluency of the output. Additionally, the overall quality of the speech and its suitability for use in a commercial application were assessed. The results reveal significant differences between the techniques. Diphone synthesis still has an inferior quality compared to the other techniques, both in terms of intelligibility and fluency. Conventional phrase concatenation is quite intelligible, but scores less on fluency. IPO's phrase concatenation is by far the best technique.",
    author = "E.A.M. Klabbers and R.P.G. Collier",
    year = "1998",
    language = "English",
    volume = "33",
    pages = "121--128",
    journal = "IPO Annual Progress Report",
    issn = "0921-2566",

    }

    Klabbers, EAM & Collier, RPG 1998, 'On the performance of speech output in a practical setting', IPO Annual Progress Report, vol. 33, pp. 121-128.

    On the performance of speech output in a practical setting. / Klabbers, E.A.M.; Collier, R.P.G.

    In: IPO Annual Progress Report, Vol. 33, 1998, p. 121-128.

    Research output: Contribution to journalArticleAcademicpeer-review

    TY - JOUR

    T1 - On the performance of speech output in a practical setting

    AU - Klabbers, E.A.M.

    AU - Collier, R.P.G.

    PY - 1998

    Y1 - 1998

    N2 - In spoken dialogue systems, in which humans interact with computers over the telephone, it is essential that the voice output of the system be of high quality. Both the intelligibility and the naturalness of the output should be sufficiently high. There are several techniques for providing a system with speech output, each with its own advantages and disadvantages. This paper discusses a formal evaluation experiment of three speech output techniques. Natural speech was included as a reference condition. The speech was rated on intelligibility and fluency of the output. Additionally, the overall quality of the speech and its suitability for use in a commercial application were assessed. The results reveal significant differences between the techniques. Diphone synthesis still has an inferior quality compared to the other techniques, both in terms of intelligibility and fluency. Conventional phrase concatenation is quite intelligible, but scores less on fluency. IPO's phrase concatenation is by far the best technique.

    AB - In spoken dialogue systems, in which humans interact with computers over the telephone, it is essential that the voice output of the system be of high quality. Both the intelligibility and the naturalness of the output should be sufficiently high. There are several techniques for providing a system with speech output, each with its own advantages and disadvantages. This paper discusses a formal evaluation experiment of three speech output techniques. Natural speech was included as a reference condition. The speech was rated on intelligibility and fluency of the output. Additionally, the overall quality of the speech and its suitability for use in a commercial application were assessed. The results reveal significant differences between the techniques. Diphone synthesis still has an inferior quality compared to the other techniques, both in terms of intelligibility and fluency. Conventional phrase concatenation is quite intelligible, but scores less on fluency. IPO's phrase concatenation is by far the best technique.

    M3 - Article

    VL - 33

    SP - 121

    EP - 128

    JO - IPO Annual Progress Report

    JF - IPO Annual Progress Report

    SN - 0921-2566

    ER -