Search Contact information
University of Cambridge Home Department of Engineering
University of Cambridge > Engineering Department > Machine Intelligence Lab

Abstract for wong_tr108

Cambridge University Engineering Department Technical Report CUED/F-INFENG/TR108

STATE-BASED CEPSTRAL DOMAIN COMPENSATION FOR IMPROVED NOISY SPEECH RECOGNITION

G. Wong

July 1992

Condition mismatch in the training and testing conditions causes recognition accuracy of Hidden Markov Model (HMM) recognizers to lower substancially. An Expectation-Maximization (EM) framework is used for this problem on the set of baseline signal, mismatched signal caused by car noise and the state sequence through an N-state HMM source model. Non-iterative cepstral compensation schemes have been derived and implemented to remove the existing mismatch caused by noise. The N-vector format word modelling the baseline signal is characterized by the sample average of speech vectors in the HMM state and the average state segmentation points. The Expectation step provides the mean squared error (MSE) of the acoustic mismatch between the two types of speech signals and the calculation of the state -sequence. The Maximization step consists of the calculation of N state-based compensation vectors.

State-based cepstral means compensation applied to the training material has brought good results when applied to a noisy digit database. Because the compensation is word-dependent, its application needs to be hypothesis-driven on the test material. Its application has not yielded very useful recognition results, especially for low signal-to-noise ratio noisy speech. Robustness of the method in terms of accuracy of the state segmentation boundaries and the applicability of the state-based cepstral means deviation vector at different signal-to-noise ratios is investigated. An iterative cepstral means shift technique is attempted and shown to improve (error rate reduction of 45%) on the baseline matched conditions. The implementation aspects of state-based compensation are discussed throughout with possible extensions for further improvements. Previous approaches to tackling mismatch signal modelling in speech are also described.


(ftp:) wong_tr108.ps.Z (http:) wong_tr108.ps.Z
PDF (automatically generated from original PostScript document - may be badly aliased on screen):
  (ftp:) wong_tr108.pdf | (http:) wong_tr108.pdf

If you have difficulty viewing files that end '.gz', which are gzip compressed, then you may be able to find tools to uncompress them at the gzip web site.

If you have difficulty viewing files that are in PostScript, (ending '.ps' or '.ps.gz'), then you may be able to find tools to view them at the gsview web site.

We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.

© 2005 Cambridge University Engineering Dept
Information provided by milab-maintainer