Gaussian Selection

Gaussian selection is used to reduce the computation of likelihoods in HMM-based recognisers. It was first introduced by Enrico Bocchieri of AT&T in 1993 (E.Bocchieri,"Vector quantization for efficient computation of continuous density likelihoods",Proc ICASSP,Vol II,ppII-692--II-695,Minneapolis,1993). Just after this, a group at SRI proposed a similar technique (H.Murveit et al,"Techniques to achieve an accurate real-time large-vocabulary speech recognition system",Proc. ARPA Workshop on Human Language Technology,pp368-373,Plainsboro, N.J.,March, 1994) . A brief description of these two techniques and their differences can be found in the two papers linked to on this page.

The reason for this page is to publish an ERRATUM to the papers. In both papers I propose the use of a different distance measure, termed a "class-weighted" measure, to assign the Gaussian components to neighbourhoods. I have discovered since returning from ICSLP that I had actually implemented another measure. The results therefore hold but the maths on the page was wrong. Tests using the previously quoted class-weight measure show it to perform less well than Bocchieri's measure (termed "weight" in the 2 papers). The actual measure implemented was based on a divergence-like metric for determining the distance between 2 components proposed by Young and Woodland in 1994 (S.J.Yound and P.C.Woodland,"State clustering in hidden Markov model-based continuous speech recognition",Computer Speech and Language,(8)1994,pp369-383). In the Gaussian Selection case, a component is assigned to a codeword if the divergence between the component and the codeword is less than some tail threshold. See FOR CORRECT DISTANCE METRIC (previously quoted metric given to show change).

Paper which uses Gaussian Selection in a speaker-dependent word-spotting task, presented at ICASSP 1996.

Paper on the use of Gaussian Selection in large vocabulary continuous-speech recognition. This paper investigates the problems that Gaussian Selection introduces into the decoder. Based on the observations made, the paper proposes techniques to improve the selectivity of the process and, hence, reduce likelihood computation still further.

Back to Audio Document Processing main page

Please send bug reports/comments/suggestions to Kate Knill (kmk@eng.cam.ac.uk)