[Univ of Cambridge]alt[Dept of Engineering]

MIL Speech Seminars 2005-2006

The MIL Speech Seminar series schedule for Easter Term 2006 is as follows:

8th May 2006 Jason Williams (MIL PhD) Using Partially-Observable Markov Decision Processes for Spoken Dialogue Management In a spoken dialog system, the role of the dialog manager is to decide what actions to take over time to help a user achieve their goal. This task is difficult in large part because speech recognition errors are common, introducing uncertainty in the current state of the conversation. In our research, we seek to model this uncertainty explicitly, and to apply machine learning techniques to generate dialog managers that cope with this uncertainty. Partially Observable Markov Decision Process, or POMDPs, present an attractive framework in this pursuit. In this talk, a method for formulating a dialog manager as a POMDP is presented. In the first part of the talk the motivation for the POMDP approach is discussed. By factoring the elements of the POMDP, a model of user behavior and speech recognition errors are directly incorporated. Results show that, on a small dialog management task, the POMDP approach outperforms a typical baseline from the literature. To date, POMDPs for dialog management have scaled poorly, and have been limited to artificially small "toy" problems. In the second part of the talk, a novel approach - called "Composite Summary point-based value iteration" - is presented, which scales POMDPs to handle slot-based dialog management problems of a realistic size. The technique is evaluated with a user model estimated from real dialog data, and results demonstrate the operation and scalability of the method.
31st May 2006 Kai Yu (MIL PhD) Unsupervised Bayesian Adaptation on Adaptively Trained Systems As the use of non-homogeneous data, such as telephone conversational speech, increases, more and more systems are built using adaptive training. A canonical model is used to represent the "pure" speech variability and a set of transforms are employed to represent the unwanted acoustic variabilities, e.g. speaker and acoustic environment changes. During adaptation and recognition, the canonical model must be adapted by transforms estimated using test-domain specific data before being used for decoding. However, in unsupervised adaptation, where no correct transcription is available for supervision data, transforms reflecting non-speech variabilities of test domain can not be generated before decoding. The direct use of the canonical model becomes a problem. This talk introduces a Bayesian framework for unsupervised adaptation on adaptively trained systems. Within this framework, transform parameters are assumed to be random and the prior distribution of transform parameters obtained in training will be used to "adapt" the canonical model. Then, the marginal likelihood of each possible hypothesis sequence is used for inference, which is calculated as an integral over the transform prior distribution. This Bayesian integral is intractable, hence approximations are required. Two types of schemes, lower bound approximation and direct approximation, are discussed in detail. An efficient recursive formula for incremental Bayesian adaptation is also derived for the lower bound approximation. Experiments were performed on a telephone conversational speech recognition task in both batch and incremental modes. It batch adaptation with very limited data, using a non-point transform distribution significantly outperformed other Bayesian approaches. The incremental adaptation experiment showed that with more adaptation data available, the point estimate of transforms became reasonable and close to non-point transform distributions. Similar trends were observed for both ML and discriminative adaptively trained systems. However, the gains on discriminative systems were less due to the use of a ML transform prior distribution.