[Univ of Cambridge] [Dept of Engineering]

Mark Gales - CSTIT Projects


Here are the projects that will be offered for the year 2001-2002. Please look at the papers and associated links for more information. If you are interested in any of these projects it is important that you contact me so that we discuss the work involved in the project. Any queries, or request for further details, please contact me by email mjfg@eng.cam.ac.uk

If there are any projects in the areas of acoustic modelling and machine learning that you would like to see run, please email me with suggestions. A couple of projects that have been offered before and I am willing to run if people are interested are here.

For additional reading and references the last few years proceedings from ICASSP,Eurospeech and ICSLP are available.

Products of Gaussians

    In current classification tasks many systems make use of sets of "experts". These experts vary in complexity from single Gaussian distributions to neural networks to hidden Markov models. A standard way of combining these experts together is as a mixtures of experts (MoE). For example the majority of state-of-the-art speech recognition systems model the output distributions of the hidden Markov model states using mixtures of Gaussian distributions (the expert in this case). One reason for the popularity of MoEs is that they are easy to train using the EM algorithm. A recently proposed alternative to a mixture of experts is the product of experts.

    Products of experts (PoEs) form a very different model to the standard mixture of experts. In PoEs the final decision may be thought of as an intersection of a set of experts, rather than in the MoE case where it is a simple union of experts. Thus, if a single expert "disagrees" with the data, this can result in that data being rejected in the PoE. This project will look at the product of Gaussian (PoGs) model. By restricting the nature of each expert to be a single Gaussian the training of each expert is dramatically simplified. A mixtures of PoGs can then be used in the classification process. PoGs may also be viewed as a natural extension to multiple synchronous stream systems (as used in HTK), without explicitly enforcing the assumption of independence between streams

  • G.E. Hinton, (1999), Products of Experts. Proceedings ICANN 1999.
  • C.K.I. Williams, F.V. Agakov and S.N. Felderhof, (2001), Products of Gaussians. Proceedings NIPS 2001.
  • See the Product of Experts publication web page of Geoff Hinton for a variety of related papers.
top

Multiple Regression Hidden Markov Models

    The standard frontends used in state-of-the-art speech recognition systems are based on PLP or MFCC feature vectors. A variety of other features may also be extracted from the speech signal, for example pitch and "formant-like" features. There are a number of ways of combining the different forms of information extracted from the speech waveform ranging from simply adding new elements to the feature vector to combining the outputs from independent classifiers. When deciding on the form of combination scheme to be used, it is important to determine whether the features extracted give information that discriminates between words, or give information about unwanted factors, such as the gender of the speaker.

    This project will examine combining additional information using a recently proposed model, the multiple regression HMM (MRHMM). This new form of model allows additional features that indicate, for example, attributes of the speaker, to be used without dramatically increasing the number of model parameters. This project will look at thoretical extensions to the standard MRHMM and the performance of such schemes. A set of standard features will be combined with features such as a pitch related parameter and features extracted from fitting Gaussian mixture models to the speech spectrum.

  • K. Fujinaga, M. Nakai, H. Shimodaira and S. Sagayama, (2001), Multiple Regression Hidden Markov Models. Proceedings ICASSP 2001.
  • M.N. Stuttle and M.J.F. Gales, (2001), A Mixture of Gaussians Front End for Speech Recognition. Eurospeech 2001.
top

Locally Linear Embedding

    One of the key issues in any pattern processing scheme is how to extract a "good" set of features. Having some compact set of features is important for both robustly training classifiers and for visualisation. A variety of dimensionality reduction schemes have been proposed over the years. Typically these are based on linear transformations, either in the form of linear discriminant analysis, or factor analysis.

    This project will examine the performance of a recently proposed scheme, locally linear embedding. Rather than performing a global linear projection, multiple local projections are performed. In the original space the relationship between each point and a subset of the points around it is estimated, typically to minimise the least squared error. Given this relationship a subspace is estimated in which the points are optimal in a least squares sense whilst satisfying the linear relationship in the original space. This project will examine the attributes of the subspaces generated from speech data and examine how these features may be used in current speech recognition systems.

  • S. Roweis and L Saul (2000), Nonlinear dimensionality reduction by locally linear embedding. Science, v.290 no.5500 , Dec.22, 2000. pp.2323--2326.
  • See the Locally Linear Embedding web page for more details and matlab code.
top

Covariance Modelling Using Rank-1 Matrices

    There is normally a simple choice made in the form of the covariance matrix to be used with continuous-density HMMs. Either a diagonal covariance matrix is used, with the underlying assumption that elements of the feature vector are independent, or a full or block-diagonal matrix is used, where all or some of the correlations are explicitly modelled. Unfortunately when using full or block-diagonal covariance matrices there tends to be a dramatic increase in the number of parameters per Gaussian component, limiting the number of components which may be robustly estimated.

    In recent years alternatives to simple block structures have been investigated including Factor Analysis and Semi-Tied Covariance matrix systems. A generalisation of many proposed schemes can be described in the form of combining appropriate rank-1 matrices. Here a set of simple, rank-1, basis matrices are trained. A linear combination of all these basis functions is then used as the covariance matrix for each component. Using this framework a very simple scheme for controlling the complexity of the covariance is possible. This project will investigate the form and use of this new scheme.

  • P.A. Olsen and R.A. Gopinath, (2001), Extended MLLT for Gaussian Mixture Models. Submitted to IEEE Transactions SAP.
top
[ Cambridge University | CUED | SVR Group | Home]