MIL Speech Seminars 2003-2004

The MIL seminar series schedule for the Lent Term 2003 was as follows:

January 28th 2003	Matt Stuttle (MIL)	A Gaussian Mixture Model Front End for Speech Recognition	Fitting a Gaussian mixture model (GMM) to the smoothed speech spectrum allows an alternative set of features to be extracted from the speech signal. These features are shown to perform worse than MFCCs on recognition tasks, but possess information complementary to the standard MFCC parameterisation. Different methods for combining the GMM features with an MFCC parameterisation are outlined and results discussed. Also, the performance of the GMM features on speech corrupted with coloured additive noise is discussed. Techniques for noise robustness and compensation are investigated for GMM features and the performance is examined on the RM task with additive noise. Finally the use of feature space transforms and speaker adaptation on the WSJ task is investigated.
February 11th 2003	Mike Brookes (Imperial College, London)	Glottal Closure Identification in Voiced Speech	Having the ability to identify the instants of glottal closure in voiced speech enables the use of larynx synchronous processing techniques such as closed-phase LPC analysis. These techniques make it possible to separate the characteristics of the glottal excitation waveform from those of the vocal tract filter and to treat the two independently in subsequent processing. Applications include low bit-rate coding, data-driven techniques for speech synthesis, prosody extraction, voice morphing, speaker normalization and speaker recognition. This talk will describe a two-stage technique for determining the glottal closure instants from the speech waveform. In the first stage, candidate closure instants are identified from the group delay of the LPC residual; in the second stage dynamic programming is used to eliminate spurious candidates. The results obtained are compared with reference closure instants derived from the direct measurement of larynx activity.
February 25th 2003	Simon King (Edinburgh)	Underlying Representations for Speech Modelling	I will report on various current research projects in CSTR which have one thing in common: they attempt to model speech using an underlying, possibly hidden, representation. In other words, they do not model speech directly in the spectral domain. This representation might be articulatory, pseudo-articulatory, just inspired by articulation, or even phonological. In the talk I will concentrate mainly on new ways of measuring join cost in unit-selection synthesis, but will touch on other projects including speech recognition, acoustic-articulatory inversion and speech signal processing and modification.
March 11th 2003	Yulan He (MIL)	Semantic Processing using Hidden Vector State Models	A Hidden Vector State (HVS) model has been proposed and developed for hierarchical semantic parsing. The model associates each state of a push-down automata with the state of a HMM. State transitions are factored into separate stack pop and push operations and then constrained to give a tractable search space. The result is a model which is complex enough to capture hierarchical structure but which can be trained automatically from unannotated data. Experiments have been conducted on data from the DARPA Communicator Travel task and the results show that the HVS model can be robustly trained from only minimally annotated corpus data. Furthermore, the HVS model outperforms a conventional finite-state semantic tagger by 36\% in F-measure and 25\% in goal detection accuracy.