The MIL Speech Seminar series schedule for Easter Term 2006 is as follows:
8th May 2006 | Jason Williams (MIL PhD) | Using Partially-Observable Markov Decision Processes for Spoken Dialogue Management | In a spoken dialog system, the role of the dialog manager is to decide what
actions to take over time to help a user achieve their goal. This task is
difficult in large part because speech recognition errors are common,
introducing uncertainty in the current state of the conversation. In our
research, we seek to model this uncertainty explicitly, and to apply machine
learning techniques to generate dialog managers that cope with this
uncertainty. Partially Observable Markov Decision Process, or POMDPs,
present an attractive framework in this pursuit.
In this talk, a method for formulating a dialog manager as a POMDP is
presented. In the first part of the talk the motivation for the POMDP
approach is discussed. By factoring the elements of the POMDP, a model of
user behavior and speech recognition errors are directly incorporated.
Results show that, on a small dialog management task, the POMDP approach
outperforms a typical baseline from the literature.
To date, POMDPs for dialog management have scaled poorly, and have been
limited to artificially small "toy" problems. In the second part of the
talk, a novel approach - called "Composite Summary point-based value
iteration" - is presented, which scales POMDPs to handle slot-based dialog
management problems of a realistic size. The technique is evaluated with a
user model estimated from real dialog data, and results demonstrate the
operation and scalability of the method. |
31st May 2006 | Kai Yu (MIL PhD) | Unsupervised Bayesian Adaptation on Adaptively Trained Systems | As the use of non-homogeneous data, such as telephone conversational
speech, increases, more and more systems are built using adaptive training.
A canonical model is used to represent the "pure" speech variability and a
set of transforms are employed to represent the unwanted acoustic
variabilities, e.g. speaker and acoustic environment changes. During
adaptation and recognition, the canonical model must be adapted by
transforms estimated using test-domain specific data before being used for
decoding. However, in unsupervised adaptation, where no correct
transcription is available for supervision data, transforms reflecting
non-speech variabilities of test domain can not be generated before
decoding. The direct use of the canonical model becomes a problem. This
talk introduces a Bayesian framework for unsupervised adaptation on
adaptively trained systems. Within this framework, transform parameters are
assumed to be random and the prior distribution of transform parameters
obtained in training will be used to "adapt" the canonical model. Then, the
marginal likelihood of each possible hypothesis sequence is used for
inference, which is calculated as an integral over the transform prior
distribution. This Bayesian integral is intractable, hence approximations
are required. Two types of schemes, lower bound approximation and direct
approximation, are discussed in detail. An efficient recursive formula for
incremental Bayesian adaptation is also derived for the lower bound
approximation. Experiments were performed on a telephone conversational
speech recognition task in both batch and incremental modes. It batch
adaptation with very limited data, using a non-point transform distribution
significantly outperformed other Bayesian approaches. The incremental
adaptation experiment showed that with more adaptation data available, the
point estimate of transforms became reasonable and close to non-point
transform distributions. Similar trends were observed for both ML and
discriminative adaptively trained systems. However, the gains on
discriminative systems were less due to the use of a ML transform prior
distribution. |