Temporal decomposition of a speech utterance results in a description of speech parameters in terms of overlapping target functions and associated target factors. The former may correspond to articulatory gestures and the latter to ideal articulatory positions. Although developed for economical speech coding, this method also provides an interesting tool for deriving phonetic information from acoustic speech signals. The speech parameters used by Atal (1983) is proposing this method were the log-area parameters. Our modified temporal decomposition method (Van Dijk-Kappers and Marcus, 1987, 1989) also works with log-area parameters as input. However, the method is not restricted to these; in principle, most commonly used parameter sets can be used. In this paper we compare the results obtained with nine different sets of speech parametes, including log-area parameters, formants, reflection coefficients and band-filter parameters. The main criterion for good performance will be correspondence between target functions and phonemes or sub-phonemes. The phonetic relevance of the target vectors will also be considered, but in less detail. Speech signal resynthesis supplies yet another criterion; for those parameters sets which are transformable into the same parameter space, a reconstruction error will be defined and evaluated. From these experiments it can be concluded that log-area parameters from the most suitable parameter set available for temporal decomposition. In some respects band-filter parameters yield better results, but this set is not classified as the best due to properties related to resynthesis.
- articulatory gestres, speech analysis
- parameter sets
- target positions
- Temporal decomposition