The MIL Speech Seminar series schedule for Easter Term 2008 is as follows:
6th May 2008 | Dan Jurafsky (Stanford University) | Inducing Meaning from Text |
Online models of word meaning (like dictionaries and thesauri) or
world knowledge (like scripts or narratives) are crucial for natural
language understanding. Could we learn these meanings automatically
from text? I first report on joint work with Rion Snow and Andrew
Ng on inducing the meaning of words from text on the Web in the
context of augmenting WordNet, a large online thesaurus of English.
These include a semi-supervised method for learning when a new word
is a `hypernym' or in the 'is-a' relation with another word, a new
probabilistic algorithm for combining evidence from multiple relation
detectors, and a algorithm for clustering the induced word senses.
I then report on joint work with Nate Chambers on inducing `narratives',
a script-like sequence of events that follow a protagonist. This
work includes inducing the relations between events, ordering the
relations and clustering them into prototype narratives.
|
12th May 2008 | Jason Williams (At&T) | Recent work on POMDP-based dialog systems at AT&T |
Building spoken dialog systems is difficult because speech recognition
errors are common and user's behavior is unpredictable, which introduces
uncertainty in the current state of the conversation. At AT&T, we have
been applying partially observable Markov decision processes (POMDPs) to
building these systems. We model the uncertainty in the dialog state
explicitly as a Bayesian network and apply machine learning techniques
to determine what the system should say or do.
In this talk, I'll review the overall approach of applying statistical
techniques and then describe two recent advances: first, because the
system must operate in real-time, efficient Bayesian inference is
crucial, yet the set of possible dialog states is enormous. To solve
this, I'll present a technique which uses a particle filter to perform
approximate inference in real-time. Second, to choose actions, ideally
we would like to combine the robustness of machine optimization with the
expertise of human designers. To tackle this, I'll present a method
which unifies human expertise with automatic optimization.
To illustrate these techniques, I'll provide examples of two dialog
systems: a voice dialer, and a troubleshooting system that helps users
restore connectivity on a failed DSL connection. Graphical displays
illustrate the operation of the techniques, and quantitative results
show that applying statistical techniques outperforms the traditional
method of building systems by hand.
|
19th May 2008 | Tomoki Toda (Nara Institute of Science and Technology) | Vocal Tract Transfer Function Estimation Using Factor Analyzed Trajectory Hidden Markov Model |
The estimation of the vocal tract transfer function (VTTF) for a
speech signal is an essential problem in speech processing. Because
the speech signal results from a convolution of the VTTF and a
quasi-periodic excitation signal, there are many missing frequency
components between adjacent harmonics of the fundamental frequency,
which make it indeed hard to extract the accurate VTTF. To address
this problem, I propose a statistical approach to the offline VTTF
estimation based on a factor analyzed trajectory hidden Markov model
that effectively models harmonic components observed over an
utterance. This model is trained so that its likelihood for the
observed harmonic component sequences is maximized while considering
VTTF parameters as hidden variables. The trained model enables the
maximum a posteriori (MAP) estimation of a time-varying VTTF sequence
considering not only harmonic components at each analyzed frame but
also those at other frames to interpolate the missing frequency
components in a probabilistic manner. The effectiveness of the
proposed method is demonstrated by a result of a simulation
experiment.
|
27th May 2008 | Jim Hieronymus (NASA Ames Research Center) | Spoken Dialogue Systems for Space and Lunar Exploration |
Building spoken dialogue systems for space applications requires systems which are flexible, portable to new applications, robust to noise and able to discriminate between speech intended for the system and conversations with other astronauts and systems. Our systems are built to be flexible by using general typed unification grammars for the language models which can be specialized using example data. These are designed so that most sensible ways of expressing a request are correctly recognized semantically. The language models are tuned with extensive user feedback and data if available. The International Space Station and the EVA Suits are noisy (76 and 70 dB SPL). This noise is best minimized by using active noise canceling microphones which permit accurate speech recognition. Finally open microphone speech recognition is important to hands free, always available operation. Out of domain utterance rejection in its most simple form depends on careful adjustment of rejection thresholds for both acoustic and natural language scores so that out of domain rejection is near 97 % and the false rejection rate is around 5 %. This means that astronauts can talk to each other and by radio to the ground without the system falsely recognizing a command or query. The effect of statistical and linguistically motivated language modeling techniques will be discussed and shown to be of comparable performance. A short clip of the surface suit spoken dialogue system being used in a field test will be shown.
|
2nd June 2008 | Filip Jurcicek | Extended HVS Parser |
In the talk, I will present several extensions to the HVS parser. First, the
initialization of its parameters was modified using automatically extracted
negative examples. Second, the HVS parser was extended so that it is able
to produce not only right branching parse trees but also the left branching
parse trees. Finally, the third modification enables the parser to process
not only words on its input but also additional features. The automatically
obtained lemmas and morphological tags were used as features and they
significantly increased the performance. Because the original parser and
the extended parser were implemented in GMTK (the Graphical Models Toolkit),
a brief description of the implementation will be given.
|
POSTPONED | Trung Bui (Twente) | TBC |
TBC
|