One way of making speech recognisers more robust to noise is model compensation. Rather than enhancing the incoming observations, model compensation techniques replace a recogniser’s clean speech distributions by distributions over the corrupted speech. The corrupted speech distribution for one clean speech component has no closed form, but is usually assumed Gaussian. Even new model compensation methods apply this approximation, though its impact has never been quantified.
This program by Rogier van Dalen is a Python program that implements a non-parametric method to, given speech and noise distributions and a mismatch function, compute the corrupted speech likelihood. It uses sampling and is exact in the limit. It therefore gives a theoretical bound for model compensation.
Though the likelihood calculation is computationally expensive, it enables a fine-grained assessment of compensation techniques, based on the KL divergence to the ideal compensation for one component. This program approximates the cross-entropy, for different methods of model compensation. The cross-entropy is the KL divergence up to a constant. The constant is the entropy of the real corrupted speech distribution. Since the new sampling method approximates the likelihood exactly in the limit, it gives the point where the KL divergence is 0. This enables a comparison of compensation methods to the optimal compensation. Instead of compensating a complete speech recogniser, this program compensates just one Gaussian.
Two potential uses for this software are envisioned. The first potential use is to reproduce results in the papers (see Bibliography), for example on different Gaussians for the clean speech and the noise. For information on how to do this, see Usage.
The second potential use is to test other compensation methods than the code currently supports. Well-known compensation methods that the code implements are VTS, DPMC and IDPMC (with and without phase factor distribution). Additionally, it implements a method that in the limit computes the corrupted speech likelihood exactly, which the papers mentioned in the Bibliography have introduced. To test other compensation methods, they could be implemented within the framework that this toolkit uses (i.e. as Python code that returns a class that implements ProbabilityDistribution). For information on how to do this, see Implementation.
This work was supported by Toshiba Research Europe Ltd., Cambridge Research Laboratory (Rogier van Dalen’s PhD, supervised by Dr Mark Gales), and by EPSRC Project EP/I006583/1 (Generative Kernels and Score Spaces for Classification of Speech) within the Global Uncertainties Programme.