The MIL Speech Seminar series schedule for Lent Term 2006 was as follows:
February 27th 2006 | Khe Chai Sim (MIL PhD) | fMPE and pMPE - A Discriminative Semi-Parametric Trajectory Model | Hidden Markov Models (HMMs) are widely used in speech recognition. For
reasons of efficiency, a series of assumptions are made about the speech
data, some of which are poor. In particular, the "independence
assumption" where observations are assumed to be conditionally
independent given the state. Thus, the output distribution associated
with an HMM states is constant. Existing ways to overcome this
limitation include the use of switching linear dynamical systems,
stochastic segment models, polynomial segment models, buried Markov
models and trajectory HMMs. To date, these models have had little success
in improving the performance of large vocabulary continuous speech
recognition systems. In this seminar, a discriminative semi-parametric
trajectory model will be presented. This model represents the Gaussian
mean vectors and covariance matrices as time varying parameters. This
time dependent parameters are modelled as a function of the location of
the current observation (and the neighbouring observations) in the
acoustic space, which is represented by a series of centroids. Model
parameters are discriminatively estimated using the Minimum Phone Error
(MPE) criterion.
One form of temporally varying mean vector is obtained by applying a
time dependent bias to the static Gaussian mean. This time dependent
bias is a weighted contribution from the bias vectors
associated with each centroid (to be estimated discriminatively).
The contribution weights are calculated as the posteriors of the
observation (and neighbouring observations) given the centroids.
The resulting model yields an fMPE model.
On the other hand, the variance of each dimension may also be scaled
by a positive time dependent factor to yield a temporally varying
covariance matrix. This model is known as pMPE. Similar to fMPE,
the time dependent scale factor is a weighted contribution from
the centroid specific scales where the weights are given by the posteriors
of the observations given the centroids.
Experimental results are given based on a large vocabulary conversational
telephone speech recognition task. Both fMPE and pMPE were found to give
gains over the MPE alone system. It was also found that combining fMPE and
pMPE could be beneficial in some cases. |
13th March 2006 | Martin Layton (MIL PhD) | Augmented Statistical Models for Speech Recognition | Recently there has been significant interest in developing new acoustic
models for speech recognition. One such model, that allows complex
dependencies to be represented, is the augmented statistical model. This
extends standard HMMs using a local exponential expansion of the HMM,
allowing additional dependencies to be incorporated. Unfortunately, the
resulting model often has an intractable normalisation term rendering
training difficult for all but binary classification tasks. In this paper,
a maximum margin criterion is presented as a practical method of
estimating augmented model parameters for binary classification tasks.
For multi-class classification, conditional augmented (C-Aug) models are
proposed as an attractive alternative. Instead of modelling utterance
likelihoods and inferring decision boundaries, C-Aug models directly model
the posterior probability of class labels, conditioned on the utterance.
The resulting model is easy to normalise and can be trained using
conditional maximum likelihood estimation. In addition, as a convex model,
the optimisation converges to a global maximum. |