[Univ of Cambridge] [Dept of Engineering]

Rapid and Robust Environment Aware Processing

[ Description | Personnel | Publications ]

Project Description

One of the fundamental problems with deploying automatic speech recognition (ASR) systems is that they must be able to operate in a wide range of acoustic environments. As the acoustic environment may vary dramatically, for example moving from a quiet office environment to a moving car with high-levels of background noise, it is essential that ASR systems can detect, and adapt to, these changing conditions. The overall aim of the project is to develop approaches that allow ASR systems to respond to changing acoustic conditions, while maintaining high levels of recognition accuracy. The schemes developed should be flexible, in that they should be applicable to a wide range of tasks, for example both small and large vocabulary systems. At the same time the computational load associated with the techniques should be tunable depending on the nature of the environment and the available resources. This project will build on the current research work on Joint Uncertainty Decoding which has been applied to a range of tasks from digit strings (AURORA2) to large vocabulary continuous speech recognition (Broadcast News Transcription).

The research to be carried out may be split into three distinct areas:

  • Rapid environment adaptation. Two related issues need to be addressed for rapid environment adaptation. First, the estimation of the noise environment must be rapid, both in terms of the time-delay incurred and the environment estimation process itself. Second, having estimated the environment parameters the noise compensation process itself must have minimal computational overhead.
  • Environment change tracking/detection. This problem may be addressed in two distinct approaches, First, detecting when the environment has changed sufficiently to adversely affect the recognition, and thus warrant updating the model parameters. Second using a scheme which continually monitors and adapts the parameters.
  • Improved robustness. Though techniques such as uncertainty decoding yield significant gains in performance, there can be unacceptably large degradations in performance as the the signal-to-noise ratio (SNR) decreases. Using HMMs alone is unlikely to address this problem. An alternative is to use discriminative models, such as support vector machines, in combination with HMMs. One initial direction is the code-breaking framework using HMMs in conjunction with SVMs based on JUD-compensated generative kernels.
The project will extend the current implementation of JUD using
HTK Version 3.4.

The project is funded by Speech Technology Group, Toshiba Research Europe Ltd and has a three year duration starting in January 2008.


Personnel Associated with the Project


Project and Related Publications

[ Cambridge University | CUED | MIL | Home]