Abstract for gales_tr133

Cambridge University Engineering Department Technical Report CUED/F-INFENG/TR133

THE THEORY OF SEGMENTAL HIDDEN MARKOV MODELS

M. J. F. Gales and S. J. Young

June 1993

The most popular and successful acoustic model for speech recognition is the Hidden Markov Model (HMM). To use HMMs for speech recognition a series of assumptions are made about the waveform, some of which are known to be poor. In particular, the `Independence Assumption' implies that all observations are only dependent on the state that generated them, not on neighbouring observations. In this paper, a new form of acoustic model is described called the Segmental Hidden Markov Model (SHMM) in which the effect of the `Independence Assumption' on the observation likelihood is greatly reduced. In the SHMM all observations are assumed to be independent given the state that generated them but additionally they are conditional on the mean of the segment of speech to which they belong. Re-estimation formulae are presented for the training of both single and multiple Gaussian Inter Mixture models and a recognition algorithm is described. Additionally it is shown that the standard HMM, both in the single Gaussian mixture and multiple Gaussian mixtures cases, is just a subset of the SHMM. The new model is shown to provide better recognition performance on a wider set of synthetic data than the standard HMM.

(ftp:) gales_tr133.ps.Z (http:) gales_tr133.ps.Z

If you have difficulty viewing files that end '.gz', which are gzip compressed, then you may be able to find tools to uncompress them at the gzip web site.

If you have difficulty viewing files that are in PostScript, (ending '.ps' or '.ps.gz'), then you may be able to find tools to view them at the gsview web site.

We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.