Abstract for woodland_arpa96

Proc. DARPA Speech Recognition Workshop '96, pp. 99-104


P.C. Woodland, M.J.F. Gales, D. Pye & V. Valtchev

April 1996

The HTK large vocabulary speech recognition system has previously shown very good performance for clean speech. This paper describes developments of the system aimed at recognition of speech from the ARPA H3 task which contains data of a relatively low signal-to-noise ratio from unknown microphones. It is shown that a two-phase approach can be effective. The first phase is to derive an initial set of models that are more appropriate for the current conditions than using models trained on clean speech. This is done using either single-pass retraining with multiple microphone data or parallel model combination which combines HMMs trained on clean data with estimates of convolutional and additive noise. The second stage provides more detailed environmental and speaker adapatation using maximum likelihood linear regression which estimates a set of linear transformations of the model parameters to the current conditions. Experiments are reported on both the 1994 ARPA CSR S5 (alternate microphones) and S10 (additive noise) spoke tasks as well as the 1995 ARPA CSR H3 task. The HTK system yielded the lowest error rates in both the H3-P0 and H3-C0 tests.

(ftp:) woodland_arpa96.ps.gz (http:) woodland_arpa96.ps.gz
PDF (automatically generated from original PostScript document - may be badly aliased on screen):
  (ftp:) woodland_arpa96.pdf | (http:) woodland_arpa96.pdf

If you have difficulty viewing files that end '.gz', which are gzip compressed, then you may be able to find tools to uncompress them at the gzip web site.

If you have difficulty viewing files that are in PostScript, (ending '.ps' or '.ps.gz'), then you may be able to find tools to view them at the gsview web site.

We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.