[Univ of Cambridge] [Dept of Engineering]

Mark Gales - Old MPhil Projects


Here are the projects that have been offered before. Please look at the papers and associated links for more information. If you are interested in any of these projects it is important that you contact me so that we discuss the work involved in the project. Any queries, or request for further details, please contact me by email mjfg@eng.cam.ac.uk

Simple Computational Auditory Scene Analysis

    Computational auditory scene analysis attempts to extract individual acoustic "objects" from input which contains a mixture of sounds from different sources. This work will look at a simple technique for seperating speech from two different speakers using models trained on the individual speakers. The scheme works by, for each time instance and each sub-band frequency, computing which of the two speakers was most likely to generate the sound. This produces a simple mask that can be used to seperate the speech of the two speakers. One of the issues in producing these masks is computational cost since the sound may have been produced by any state of either model. A simple approximation has been proposed that speeds up this process to make the computational cost tractable. This project involves building a simple system to unmix speech from two known speakers and to investigate possible refinements to the system described in the paper below.

  • S. Roweiss, (2000), One Microphone Source Seperation. Proceedings NIPS 2000.
top

Complementary System Generation

    The current large vocabulary speech recognition systems used for evaluations typically combine multiple systems together using for example ROVER. Though performance gains have beenb obtained using these schemes, no systematic for generating systems that compliment one another have been investigated. This is the aim of this project.

    Boosting is a technique for sequentially training and combining a collection of classifiers in such a way that later classifiers make up for deficiencies in earlier classifiers. In this fashion multiple classifiers may be trained and used. Recently it has been applied to a state-of-the-art speech recognition system[1]. This project will look at boosting various complexity speech recognition systems. The various play-offs of number of parameters and recognition performance when systems are trained using convectional techniques versus multiple classifiers trained using boosting will be investigated. For more information about boosting see the references in the paper below.

  • G. Zweig and M. Padmanabhan, (2000), Boosting Gaussian Mixtures in an LVCSR System. Proceedings ICASSP 2000.
  • See the Boosting web page for a variety of related papers.
top

SVMs for Speaker Identification

    Speaker identification and speaker verification are important aspects of speech technology. In speaker identification the task is to decide which of a set of enrolled speakers is trying to, for example, access the computing system. Whereas speaker verification is a binary choice of accepting whether a speaker is whom they claim to be. In recent years, support vector machines (SVMs) have been shown to be a powerful classifier. SVMs have often outperformed other classification schemes, such as multi-layer perceptrons and Gaussian mixture models. However in their standard form SVMs are strictly only applicable to static, binary problems.

    This project will examine the use of SVMs with variable length speech signals for speaker verification. The normalisation of the variable length signal is achieved using a recent extension to SVM, Fisher kernels. These use generative models to determine the kernel space in which the support vectors are produced. Various forms of generative model and partitioning of the feature space will be investigated.

    The project will make use of standard SVM training toolkits, such as SVMlight, which has been extended to handle Fisher kernels.

  • N.D. Smith and M.J.F. Gales (2002) Using SVMs to Classify Variable Length Speech Patterns Technical Report CUED/F-INFENG/TR.412 April 2002 (Revised version).
top
[ Cambridge University | CUED | SVR Group | Home]