The last decade or so has witnessed a lively interest in the study of prosodic phenomena. A major impetus for this new trend has been coming from speech technology, in particular from speech synthesis. In this field, a strong need has arisen to provide systems for text-to-speech conversion with more pleasant and natural-sounding variations in melody, rhythm, voice quality, and other prosodic features. Perhaps, surprisingly, perhaps, little or no knowledge could be derived from the existing literature on the prosody of many of the languages for which the technology was being developed. Indeed, more often than not, existing descriptions appeared to be too qualitative in nature and provided an insufficient basis for the definition of synthesis rules. Facing this discrepancy between phonetic supply and technological demand, speech researchers have attempted to unravel regularities in intonation, accentuation, temporal variation, and other descriptions, by using new procedures for automatic rule extraction. These rules may lead to acceptable synthesis to the extent that they adequately model recurring patterns in the acoustic structure of speech. However, they are less likely to provide insight into the production and perception of prosody and its role in speech communication. This technology-driven approach, to be referred to as analysis-for-synthesis, may nevertheless be an efficient short-cut in an applications-oriented line of work. It may even be an indispensible support for the development of speech-based products and services in the near future. Yet, whenever possible, preference should be given to a research methodology that combines basic insight through experimentation with practical testing through application.
|Title of host publication||Progress in speech synthesis|
|Editors||J.P.H. Santen, van, R.W. Sproat, J.P. Olive, J. Hirschberg|
|Place of Publication||New York|
|Publication status||Published - 1997|