Abstract for ahadi_thesis

PhD Thesis, University of Cambridge

BAYESIAN AND PREDICTIVE TECHNIQUES FOR SPEAKER ADAPTATION

Seyed Mohammad Ahadi-Sarkani

January 1996

HMM-based speech recognition systems have recently demonstrated impressive recognition performance. Many of these systems attempt to provide low error rates for a large range of speakers. However, the performance of these speaker independent systems is generally inferior to speaker dependent systems trained for a specific speaker.

In this thesis, the problem of speaker adaptation using small amounts of speaker-specific data in order to improve speaker independent performance is addressed. Two different approaches to solving this problem are considered: a Bayesian model parameter adaptation technique and a model parameter prediction technique.

The Bayesian model adaptation technique, also called Maximum a posteriori (MAP) estimation, tries to update the HMM parameters using the available utterances from a new speaker and prior information to overcome the sparse training data problem. An implementation of this approach using the Forward-Backward algorithm is reported and several issues regarding the implementation and prior parameter estimation are evaluated. Furthermore, the use of MAP estimation for supervised and unsupervised adaptation using both batch and incremental adaptation modes is discussed. A speaker clustering approach to prior parameter improvement is also introduced.

The second adaptation technique called Regression-based Model Prediction (RMP) is a predictive approach which uses linear regression to find the phone model relationships in an HMM system. These model relationships are used in a predictive fashion to help a model parameter adaptation scheme improve further when only sparse training data is available. In this way the parameters of unadapted or poorly adapted models are predicted from the better trained model parameters. In this work, RMP is applied to the models already adapted by MAP estimation for further improvement, and is found to be useful for very fast speaker adaptation purposes. Several issues which can help in improving the performance of an RMP adapted system have been reported such as the use of multiple regression, dynamic setting of regression order and iterative RMP adaptation.

Experiments on the above techniques using data from ARPA RM and WSJ databases are described.

Keywords: speech recognition, Hidden Markov Models, speaker adaptation, maximum a posteriori estimation, regression-based model prediction.

(ftp:) ahadi_thesis.ps.Z (http:) ahadi_thesis.ps.Z
PDF (automatically generated from original PostScript document - may be badly aliased on screen):
(ftp:) ahadi_thesis.pdf | (http:) ahadi_thesis.pdf

If you have difficulty viewing files that end '.gz', which are gzip compressed, then you may be able to find tools to uncompress them at the gzip web site.

If you have difficulty viewing files that are in PostScript, (ending '.ps' or '.ps.gz'), then you may be able to find tools to view them at the gsview web site.

We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.