In this correspondence, we present some preliminary results on using phonetic subword units in word recognition as compared to whole word templates. The phonetic subword units are specified as either phonelike units with and without temporal structure or as diphonelike units. The determination of these subword units requires segmentation, labeling, and parameter estimation at the same time, and is performed by an iterative two-stage algorithm consisting of nonlinear time alignment and parameter estimation. Experiments were carried out, using a connected digit recognition task, to study the usefulness of the subword unit representation and the effect on recognition performance of some versions of the subword specification. The best error rates for subword units are still, by a factor of 2 or more, larger than those for whole word templates.
|Journal||IEEE Transactions on Acoustics, Speech, and Signal Processing|
|Publication status||Published - 1986|