The MIL Speech Seminar series schedule for the Long Vacation 2009 was as follows:
1st September 2009 | Scott Novotney (HLTCOE and BBN Technologies) | Factors Affecting ASR Model Self-Training |
Low-resource ASR self-training seeks to minimize resource requirements
such as manual transcriptions or language modeling text. This is
accomplished by training on large quantities of audio automatically
labeled by a small initial model. By analyzing our previous
experiments
with the conversational telephone English Fisher corpus, we
demonstrate
where self-training succeeds and under what resource conditions it
provides the most benefit. Additionally, we will show success on
Spanish
and Levantine conversational speech as well as the tougher English
Callhome set, despite initial WER of more than 60%. Finally, by
digging
beneath average word error rate and analyzing individual word
performance, we show that self-trained models successfully learn new
words. More importantly, self-training benefits most words which
appear
in the unlabeled audio but do not appear in the manual transcriptions.
|
4th September 2009 | Jen-Tzung Chien (National Cheng Kung University, Taiwan) | Bayesian Learning Approaches for Speech Recognition |
In this talk, I will present my previous and ongoing studies on
Bayesian
learning for speech recognition. In the areas of speech recognition,
Bayesian adaptation has been widely presented to deal with the issue
of
speaker adaptation where the likelihood function of adapation data
and the
prior density of the existing model are merged to find the adapted
model
for new speaker. Such a Bayesian learning approach is not only
useful for
model adaptation but also for model regularization where the
regularized
hidden Markov models (HMMs) are good for prediction of unknown test
data.
The regularized HMMs can be applied for decision tree state tying in
a data
generation model and even can be integrated with a large magin
classifier
to improve the generalization of a discriminative model based on the
large
margin HMMs. Furthermore, the Bayesian learning is beneficial for
topic-
based language model under the paradigm of latent Dirichlet
allocation
(Blei et al., 2003). A Bayesian topic-based language model shall be
presented for speech recognition. This regularized language model is
established according to the marginal likelihood over the
uncertainties of
latent topics and topic mixtures. The topic information is extracted
from
the n-gram events and directly applied for speech recognition. At
last, I
will summarize my viewpoints about the studies of Bayesian learning
and
address the other challenging topics of machine learning methods for speech
|