Model compensation for noise-robustness replaces the Gaussian distributions in speech recognisers by “compensated” versions that approximate the corrupted speech. This toolkit looks at this process in detail, for one Gaussian. It measures how close to the “real” corrupted speech distribution model compensation comes. It takes one speech Gaussian and one noise Gaussian and combines them. It supports a number of well-known model compensation methods: VTS, DPMC, and IDPMC. The highlight is a new method, which uses Monte Carlo to approximate the corrupted-speech log-likelihood. To assess the different compensation methods, the toolkit approximates the cross-entropy (the KL divergence plus a constant) from the compensated distribution to the “real” distribution with another level of sampling.
This toolkit contains a very simple program that computes the cross-entropy for compensation methods. When it is called without command line parameters,
it will draw 5000 samples from the “real” distribution. The models for the speech and noise that are used are in the examples/uh_4_3_component_2.tex and examples/oproom.txt. The mismatch function is specified as in Mismatch function. On the terminal, this appears:
Drawing 5000 samples.
It will then go through the list of compensation methods and initialise each of them:
Setting up distributions: No compensation VTS compensation VTS compensation (phase-sensitive) … DPMC compensation (100 samples/component, 1 components) DPMC compensation (1000 samples/component, 1 components) DPMC compensation (10000 samples/component, 1 components) DPMC compensation (50000 samples/component, 1 components) …
Estimating DPMC requires sampling, so this process is slow.
When it has initialised all corrupted speech distributions, it starts approximating the cross-entropy for each of them. For more details on how this works, see the papers in the Bibliography. The short version is: the cross-entropy is the KL divergence plus a constant that cannot be computed. The lower the cross-entropy, the better the compensation method. The program will take a long while and output something like:
Computing cross-entropy for each, using all samples. 100% (5000/5000), 5s elapsed, 0s to go 41.4751583862 - No compensation 46.9960021195 - VTS compensation 38.1993191829 - VTS compensation (phase-sensitive) 37.0170928543 - VTS compensation (diagonal) 36.2945714989 - VTS compensation (diagonal) (phase-sensitive) …
The first entry on the line gives the cross-entropy for the compensation method described after it.
The process of setting up distributions is slow, and the process of computing cross-entropies with transformed-space sampling is even slower. Therefore, it is possible to specify on the command line which compensation method to use. This uses a simplistic selection tool: the index of the compensation method. For example,
> ./cross-entropy.py 126 Drawing 5000 samples. Setting up distributions: Sequential importance sampling: quasi-conditional factorisation 2 Computing cross-entropy for each, using all samples. 100% (5000/5000), 12m 5s elapsed, 0s to go 68.1991869079 - Sequential importance sampling quasi-conditionals 2
Even by splitting up computation per compensation method, some methods, like transformed-space sampling with a large sample cloud becomes very slow. Since the computation is separate per corrupted speech sample, it is relatively straightforward to parallelise it. This is how the results in the papers in the Bibliography have been computed. However, the scripts that were used are particular to the Cambridge set-up. Therefore, they are not included in the distribution of the toolkit.
However, the output of the parallel run of compensation methods, which was used to produce the graphs in the papers, can be found in Results. The cross-entropy for transformed-space sampling with a sample cloud size of 16384 should be close to the best obtainable cross-entropy. For practical purposes, therefore, this can be assumed to be the point where the KL divergence is 0. It should therefore be possible to compare the cross-entropy for a new compensation method with the numbers reproduced in Results.
To run different experiments, cross-entropy.py needs to be edited. For example, to change the number of corrupted speech samples, set sampleNum. To add a new compensation method to the list of compensation methods, add a function that takes distributions of the speech, noise, and phase factor to curriedDistributions. It will also be necessary to implement the new compensation method in Python in the same way that, for example, vtsCompensate() is. The next section, Implementation, gives a reference for classes and functions in the current implementation that should be useful.
A new method of assessing compensation quality is with the unnormalised KL divergence over the whole HMM. This method is introduced in a paper that is currently under submission. The computational requirements then become so great that a framework that analyses the expression tree and farms out compute jobs over a number of server is required. Additionally, a complete HMM is necessary as a speech model. Regretfully, none can be distributed with this toolkit, so that some work would be required to get this set-up to run. The directory ../RM should be the base directory for the experiments. In this directory, the Noise file should contain the log-spectral noise Gaussian; the Speech.gz file a log-spectral HMM definition and Stats should contain occupancy statistics generated with HTK. Then, the following will start a Python shell with a number of commands to control the evaluation of the cross-entropy:
starts the evaluation of cross-entropy for a number of compensation methods. If interrupted, next time it will start off where it stopped. The commands submit() and status() allow parallel execution on the Sun Grid Engine. Regretfully, it is impossible to distribute a set-up that works out of the box. However, if you have questions on running these experiments, do get in touch.