Abstract for rosti_tr461

Cambridge University Engineering Department Technical Report CUED/F-INFENG/TR461


A-V.I. Rosti & M.J.F. Gales

December 12, 2003

This paper describes the application of Rao-Blackwellised Gibbs sampling (RBGS) to speech recognition using switching linear dynamical systems (SLDSs) as the acoustic model. The SLDS is a hybrid of standard hidden Markov models (HMMs) and linear dynamical systems. It is an extension of the stochastic segment model (SSM) where segments are assumed independent. SLDSs explicitly take into account the strong co-articulation present in speech using a Gauss-Markov process in a low dimensional, latent, state space. Unfortunately, inference in SLDS is intractable unless the discrete state sequence is known. RBGS is one approach that may be applied for both improved training and decoding for this form of intractable model. The theory of SLDS and RBGS is described, along with an efficient proposal distribution. The performance of the SLDS and SSM using RBGS for training and inference is evaluated on the ARPA Resource Management task.

