Abstract for nolazco_tr128

Cambridge University Engineering Department Technical Report CUED/F-INFENG/TR128

CSS-PMC: A COMBINED ENHANCEMENT/COMPENSATION SCHEME FOR CONTINUOUS SPEECH RECOGNITION

J. A. Nolazco Flores and S. J. Young

June 1993

Training HMMs on the same conditions as in recognition makes models learn not only the features of the speech, but also those of the environment. However, attempting to produce models for all possible environments is impractical. One way to solve this problem is to compensate models trained on clean speech to give ``artificially'' adapted models. The goal of these noise adaptation techniques is to reach the same recognition performance as would be obtained by training in the noisy conditions.

However, even training in noise can only achieve limited recognition performance because the high variance at low SNR makes the features begin to overlap thereby reducing discrimination. The problem is even worse when the vocabulary grows. In order to improve recognition performance in very noisy environments, speech enhancement techniques must be useful. Enhancement schemes can improve the SNR, minimise the variance, and emphasise the important features of the signal, but at the expense of signal distortion. Minimising both signal distortion and noise, a signal with better features and lower variability is obtained.

In our earlier work, speech models were adapted to a signal enhanced by spectral subtraction using Parallel Model Compensation (PMC) in a scheme called SS-PMC. Although very good performance was demonstrated for the SS-PMC scheme, it does require a explicit word boundary detector and this limits its use in practice. In order to avoid this drawback, a Continuous Spectral Subtraction(CSS) scheme has been developed.

In this new system, speech models are adapted for a signal enhanced by this CSS scheme. It will be shown that the enhanced signal after being processed by the CSS can be represented by the addition of the noisy speech plus a correction term in the linear domain. SS-PMC transforms the noise and speech model parameters from the cepstral domain to the linear domain, adds these parameters and the SS correction term, and then creates an adapted model by returning to the cepstral domain. Therefore, SS-PMC can be modified to compensate for the correction term in the linear domain. This modified version of SS-PMC will be called the CSS-PMC method.

The results obtained by the CSS-PMC technique are very encouraging, showing that it is very effective to use adaptation techniques to compensate for the signal distortion which is a side effect of a CSS-based enhancement scheme.

(ftp:) nolazco_tr128.ps.Z (http:) nolazco_tr128.ps.Z
PDF (automatically generated from original PostScript document - may be badly aliased on screen):
(ftp:) nolazco_tr128.pdf | (http:) nolazco_tr128.pdf

If you have difficulty viewing files that end '.gz', which are gzip compressed, then you may be able to find tools to uncompress them at the gzip web site.

If you have difficulty viewing files that are in PostScript, (ending '.ps' or '.ps.gz'), then you may be able to find tools to view them at the gsview web site.

We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.