Mark Gales - CSTIT Projects
Here are the projects that will be offered for the year 2001-2002. Please
look at the papers and associated links for more information.
If you are interested in any of these projects it is important that you
contact me so that we discuss the work involved in the project.
Any queries, or request for further details, please contact me by email
mjfg@eng.cam.ac.uk
If there are any projects in the areas of acoustic modelling and
machine learning that you would like to see run, please
email me
with suggestions. A couple of projects that have been
offered before and I am willing to run if people
are interested are here.
For additional reading and references the last few years proceedings from
ICASSP,Eurospeech and ICSLP
are available.
Products of Gaussians
In current classification tasks many systems make use of sets of
"experts". These experts vary in complexity from single Gaussian
distributions to neural networks to hidden Markov models. A standard
way of combining these experts together is as a mixtures of experts
(MoE). For example the majority of state-of-the-art speech recognition
systems model the output distributions of the hidden Markov model
states using mixtures of Gaussian distributions (the expert in this
case). One reason for the popularity of MoEs is that they are easy to
train using the EM algorithm. A recently proposed alternative to a
mixture of experts is the product of experts.
Products of experts (PoEs) form a very different model to the standard
mixture of experts. In PoEs the final decision may be thought of as an
intersection of a set of experts, rather than in the MoE case where it
is a simple union of experts. Thus, if a single expert "disagrees"
with the data, this can result in that data being rejected in the
PoE. This project will look at the product of Gaussian (PoGs) model.
By restricting the nature of each expert to be a single Gaussian
the training of each expert is dramatically simplified. A mixtures
of PoGs can then be used in the classification process.
PoGs may also be viewed as a natural extension to multiple
synchronous stream systems (as used in HTK), without explicitly enforcing
the assumption of independence between streams
- G.E. Hinton, (1999),
Products of Experts.
Proceedings ICANN 1999.
- C.K.I. Williams, F.V. Agakov and S.N. Felderhof, (2001),
Products of Gaussians.
Proceedings NIPS 2001.
- See the
Product of Experts
publication web page of Geoff Hinton for a variety of related papers.
top
Multiple Regression Hidden Markov Models
The standard frontends used in state-of-the-art speech recognition
systems are based on PLP or MFCC feature vectors. A variety of
other features may also be extracted from the speech signal, for
example pitch and "formant-like" features.
There are a number of ways of combining the different forms of
information extracted from the speech waveform ranging from
simply adding new elements to the feature vector to combining
the outputs from independent classifiers. When deciding on
the form of combination scheme to be used, it is important to
determine whether the features extracted give information that
discriminates between words, or give information about unwanted
factors, such as the gender of the speaker.
This project will examine combining additional information using a
recently proposed model, the multiple regression HMM (MRHMM). This
new form of model allows additional features that indicate, for
example, attributes of the speaker, to be used without dramatically
increasing the number of model parameters. This project will look
at thoretical extensions to the standard MRHMM and the performance of
such schemes. A set of standard features will be combined with
features such as a pitch related parameter and features extracted
from fitting Gaussian mixture models to the speech spectrum.
- K. Fujinaga, M. Nakai, H. Shimodaira and S. Sagayama, (2001),
Multiple Regression Hidden Markov Models.
Proceedings ICASSP 2001.
- M.N. Stuttle and M.J.F. Gales, (2001),
A Mixture of Gaussians Front End for Speech Recognition.
Eurospeech 2001.
top
Locally Linear Embedding
One of the key issues in any pattern processing scheme is how to extract
a "good" set of features. Having some compact set of features is important
for both robustly training classifiers and for visualisation. A variety of
dimensionality reduction schemes have been proposed over the years. Typically
these are based on linear transformations, either in the form of linear
discriminant analysis, or factor analysis.
This project will examine the performance of a recently proposed scheme,
locally linear embedding. Rather than performing a global linear
projection, multiple local projections are performed. In the original
space the relationship between each point and a subset of the points
around it is estimated, typically to minimise the least squared error.
Given this relationship a subspace is estimated in which the points
are optimal in a least squares sense whilst satisfying the
linear relationship in the original space. This project will
examine the attributes of the subspaces generated from speech
data and examine how these features may be used in current speech
recognition systems.
- S. Roweis and L Saul (2000),
Nonlinear dimensionality reduction by locally linear embedding.
Science, v.290 no.5500 , Dec.22, 2000. pp.2323--2326.
- See the
Locally Linear Embedding
web page for more details and matlab code.
top
Covariance Modelling Using Rank-1 Matrices
There is normally a simple choice made in the form of the
covariance matrix to be used with continuous-density HMMs. Either a
diagonal covariance matrix is used, with the underlying assumption
that elements of the feature vector are independent, or a full or
block-diagonal matrix is used, where all or some of the correlations
are explicitly modelled. Unfortunately when using full or
block-diagonal covariance matrices there tends to be a dramatic
increase in the number of parameters per Gaussian component, limiting
the number of components which may be robustly estimated.
In recent years alternatives to simple block structures have been
investigated including Factor Analysis and Semi-Tied Covariance
matrix systems. A generalisation of many proposed schemes can
be described in the form of combining appropriate rank-1
matrices. Here a set of simple, rank-1, basis matrices are trained.
A linear combination of all these basis functions is then used as
the covariance matrix for each component. Using this framework
a very simple scheme for controlling the complexity of the
covariance is possible. This project will investigate the form
and use of this new scheme.
- P.A. Olsen and R.A. Gopinath, (2001),
Extended MLLT for Gaussian Mixture Models.
Submitted to IEEE Transactions SAP.
top
[ Cambridge University |
CUED |
SVR Group |
Home]
|