MIL Speech Seminars 2003-2004

The MIL Speech Seminar series schedule for the Michaelmas Term 2003 was as follows:

October 21st 2003	Gunnar Evermann (MIL)	Designing Fast LVCSR Systems	Large vocabulary speech recognition systems often require very significant amounts of compute time to achieve state-of-the-art recognition accuracy. This is true in particular for the large systems developed for the annual US government sponsored LVCSR evaluations. Typical these systems will run around 200 times slower than real time (200xRT). Recently, there has been increased interest in building LVCSR systems designed to run much faster, for example in less than 10xRT. Developing fast systems with state-of-the-art recognition performance is explicitly one of the aims of the DARPA EARS project. This seminar will give a brief overview of the work funded under the DARPA EARS project at CUED and describe the structure and components of a typical evaluation-style LVCSR system before discussing the "fast" (less than 10xRT) systems developed for the Broadcast News and Conversational Telephone Speech tasks. Results on the official Rich Transcription evaluation test conducted earlier this year will be presented.
November 4th 2003	Lan Wang (MIL)	Discriminative Speaker Adaptive Training	Speaker Adaptive Training (SAT) applies speaker-specific training-set transforms in the HMM parameter optimization to improve the speaker-independent acoustic models (the canonical models). The use of discriminative training criteria for Speaker Adaptive Training (SAT) has been investigated, where both the transform generation and model parameter estimation are estimated using the minimum phone error (MPE) criterion. In a similar fashion to the use of I-smoothing for standard MPE training, a smoothing technique is introduced to avoid over-training when optimizing MPE-based feature-space transforms. Experiments on a Conversational Telephone Speech (CTS) transcription task demonstrate that MPE-based SAT models can reduce the word error rate over non-SAT MPE models by 1.0\% absolute, after lattice-based MLLR adaptation. Moreover, a simplified implementation of MPE-SAT with the use of constrained MLLR, in place of MPE-estimated transforms, is also discussed.
November 18th 2003	Nick Hughes (Robotics Research Group, Oxford)	Probabilistic Models for Automated ECG Interval Analysis	The electrocardiogram (ECG) is an important non-invasive tool for assessing the condition of the heart. By examining the ECG signal in detail it is possible to derive a number of informative measurements from the characteristic ECG waveform. Perhaps the most important of these measurements is the "QT interval", which plays a crucial role in clinical drug trials. In particular, drug-induced prolongation of the QT interval (known as long QT syndrome) can result in a very fast abnormal heart rhythm which is often followed by sudden cardiac death. Failure to detect long QT syndrome is a serious issue, which has recently received significant media attention due to the unexpected side-effects of the antihistamine Triludan. In addition, the genetic form of long QT syndrome is believed to be responsible for the death of the footballer Marc Vivien-Foe during this summer's Confederations Cup. In this talk I will describe my work on developing probabilistic models for automatically segmenting ECG waveforms into their constituent waveform features. I will show how wavelet methods, and in particular the undecimated wavelet transform, can be used to provide a representation of the ECG which is more appropriate for subsequent modelling. I will then examine the use of hidden Markov models for segmenting the resulting wavelet coefficients, and show that the state durations implicit in a standard HMM are ill-suited to those of real ECG features. This motivates the use of hidden semi-Markov models which provide improved duration modelling and a more robust segmentation. Finally i will discuss how these models can be adapted to changes in the patient's heart rate.
November 25th 2003	Ricky Chan (MIL)	Lightly Supervised Discriminative Training	In state-of-the-art large vocabulary continuous speech recognition (LVCSR) systems, large amounts of acoustic training data are required for estimating the model parameters robustly. To produce high quality manual transcriptions for the acoustic training data, however, is very time consuming and expensive and thus limits the audio data that can be used. Therefore, it is interesting to use large amounts of low cost training data transcriptions for acoustic model training - Lightly supervised training. In this seminar, I will present lightly supervised discriminative training with large amounts of broadcast news data for which only partially correct closed caption transcriptions are available. In particular, language models biased to the closed-caption transcripts are used to recognise the audio data, and the recognised transcripts are then used as the training transcriptions for acoustic model training. A range of experiments that use maximum likelihood (ML) training as well as discriminative training based on either maximum mutual information (MMI) or minimum phone error (MPE) will be presented. In a 5xRT broadcast news transcription system that includes adaptation, it is shown that reductions in word error rate (WER) in the range of 1% absolute can be achieved. Finally, some experiments on training data selection will be presented to compare different methods of "filtering" the transcripts.