Mark Gales - 4th Year Projects
Transcribing YouTube: Who Spoke What When?
Inifinite Gaussian Mixture Models for Speech Recognition
There are some notes on statistical pattern processing on-line.
Compressive Sensing for Speech Recognition (F-MJFG-1)
This project combines parametric and non-parametric approaches to speech recognition that address all of these problems. The work will use parametric models to map the variable length speech data to a fixed length feature vector, a score. this handles the time varying aspects of the acoustic signal. By appropriately modifying the generative model it is possible to handle changes in the noise conditions and speaker.
In previous work SVMs have been used to generate a sparse, fixed, representation for the decision boundaries. This project will compare this fixed sparse representation, with a sparse representation dependent on the current word or sentence being evaluated. This will make use of recently proposed Bayesian Compressive Sensing approach.
The performance of the system will be evaluated against exiting SVM systems and standard speech recognition systems.
Transcribing YouTube: Who Spoke What When? (F-MJFG-2)
This project will apply the state-of-the-art speech recognition systems developed in the Speech Group to YouTube data from more challenging sources. The data supplied by Google consists of audio from a number of election speeches from the 2008 US Presidential election. This data has a number of problems associated with it, including wide-ranges of background noise and highly spontaneous speaking style.
The project aims to extract three forms of information from the audio stream:
This allows a more informative transcription to be generated.
The project will take an existing Broadcast News Transcription, updated to reflect the vocabulary of the 2008 elections, and examine the performance on the YouTube Election data against simpler Broadcast News style data. A scheme for detecting names within the transcription will then be developed and evaluated for extracting the actual name of the speaker (where available). Using this additional information the aim is to improve the performance of the system by incorporating, for example, information from previous speeches both]iun the form of text and audio information.
Inifinte Gaussian Mixture Models for Speech Recognition (F-MJFG-3)
The aim of this project is to examine these forms of model for speech recognition. In order to initially simplify the process, standard gen eratiove models will be used to map the variable length data to a fixed size and handle any requirements for speaker and environment changes. The output from these generative sequence score-spaces will then be modelled using an inifinite GMM. If time allows the scheme will be extended to inifinte hidden Markov models where the complete observartion sequence is directly modelled
In previous work SVMs have been used to generate a sparse, fixed, representation for the decision boundaries. This project will compare this fixed sparse representation, with the inifinte GMM and other Hieracrhical Dirichlet Prior Processe.