MIL Speech Seminars 2006-2007
The MIL Speech Seminar series schedule for Easter Term 2007 is as follows:
| postponed! || Vidura Seneviratne (MIL PhD) ||
| 14th May 2007 || Roger Moore (Sheffield) ||
The Future of Spoken Language
Processing: Where Do We Go From Here? ||
Recent years have seen steady improvements in the quality and performance of
speech-based human-machine interaction driven by a significant convergence
in the methods and techniques employed. However, the quantity of training
data required to improve state-of-the-art systems seems to be growing
exponentially, and performance appears to be asymptoting to a level that may
be inadequate for many real-world applications. This suggests that there
may be a fundamental flaw in the underlying architecture of contemporary
systems, as well as a failure to capitalize on the combinatorial properties
of human spoken language. The future direction for research into spoken
language processing is thus currently uncertain.
This talk addresses these issues by stepping outside the usual domains of
speech science and technology, and instead draws inspiration from recent
findings in the neurobiology of living systems. In particular, four areas
will be discussed: the growing evidence for an intimate relationship between
sensor and motor behaviour in living organisms, the power of negative
feedback control to accommodate unpredictable disturbances in real-world
environments, mechanisms for imitation and mental imagery for learning and
modelling, and hierarchical models of temporal memory for predicting future
behaviour and anticipating the outcome of events.
The talk will conclude by showing how these results point towards a novel
architecture for speech-based human-machine interaction that blurs the
distinction between the core components of a traditional spoken language
dialogue system; an architecture in which cooperative and communicative
behaviour emerges as a by-product of a model of interaction where the system
has in mind the needs and intentions of a user, and a user has in mind the
needs and intentions of the system.
| 29th May 2007 || Jose B. Marino (UPC) ||
Reordering Models and
Discriminative Alignment for N-gram Based SMT ||
N-gram based SMT is a recent approach to SMT introduced by the Speech
Processing Group of TALP Research Center (Universitat Politècnica de
Catalunya, Barcelona, Spain). This new approach has been shown to
provide state-of-the-art performance in evaluation campaigns that
have taken place during the last few years. In this seminar a short
review of this approach is given and new developments are addressed.
Firstly, reordering of the source text is considered in order to
reproduce the order of the words in the target language. Two models are
introduced: one linguistically motivated (taking into account POStag or
syntax-tree information) and another based only on statistical
characterization of the source-target alignment. Both approaches provide
the decoder with multiple reordering options; thus, the reordering
actually carried out is decided using all the information available
to the SMT system.
Secondly, a discriminative framework for bilingual word alignment is
introduced. This new approach aims to optimize the word alignment so
that the performance of the SMT system is optimized.
Experimental results of both developments will be discussed.
| 25th June 2007 || Thomas Hain (Sheffield) ||
The AMI Meeting Transcription System ||
In this talk the AMI system for automatic transcription of speech
recorded in meetings is presented. Meetings are the subject of
current investigation in several large research projects where means
for facilitation of effective communication by provision of automatic
supportive tools are developed. In the AMI/AMIDA project these tools
analyse audio-visual sensor input to provide records and effective
access for revision as well as aids for remote participation.
Naturally automatic speech recognition is an essential component for
this analysis. Even though speech recorded in meetings is
conversational by nature several aspects lead to poorer performance
than that achieved for example on conversational telephone speech.
The reasons are found in a more complex acoustic scenario, for
example when using recordings from microphones located on a table in
a reverberant room. Concurrent speech, a more complex conversational
structure and a wide range of topics further lead to degradation in
performance. Components in the AMI system addressing some of these
issues will be presented together with an overview of the development
strategy and the system architecture of the system submitted in the
NIST RT'07s evaluations.