MIL Speech Seminars 2006-2007

The MIL Speech Seminar series schedule for Easter Term 2007 is as follows:

postponed!

Vidura Seneviratne (MIL PhD)

TBC

TBC

14th May 2007

Roger Moore (Sheffield)

The Future of Spoken Language Processing: Where Do We Go From Here?

Recent years have seen steady improvements in the quality and performance of speech-based human-machine interaction driven by a significant convergence in the methods and techniques employed. However, the quantity of training data required to improve state-of-the-art systems seems to be growing exponentially, and performance appears to be asymptoting to a level that may be inadequate for many real-world applications. This suggests that there may be a fundamental flaw in the underlying architecture of contemporary systems, as well as a failure to capitalize on the combinatorial properties of human spoken language. The future direction for research into spoken language processing is thus currently uncertain. This talk addresses these issues by stepping outside the usual domains of speech science and technology, and instead draws inspiration from recent findings in the neurobiology of living systems. In particular, four areas will be discussed: the growing evidence for an intimate relationship between sensor and motor behaviour in living organisms, the power of negative feedback control to accommodate unpredictable disturbances in real-world environments, mechanisms for imitation and mental imagery for learning and modelling, and hierarchical models of temporal memory for predicting future behaviour and anticipating the outcome of events. The talk will conclude by showing how these results point towards a novel architecture for speech-based human-machine interaction that blurs the distinction between the core components of a traditional spoken language dialogue system; an architecture in which cooperative and communicative behaviour emerges as a by-product of a model of interaction where the system has in mind the needs and intentions of a user, and a user has in mind the needs and intentions of the system.

29th May 2007

Jose B. Marino (UPC)

Reordering Models and Discriminative Alignment for N-gram Based SMT

N-gram based SMT is a recent approach to SMT introduced by the Speech Processing Group of TALP Research Center (Universitat Politècnica de Catalunya, Barcelona, Spain). This new approach has been shown to provide state-of-the-art performance in evaluation campaigns that have taken place during the last few years. In this seminar a short review of this approach is given and new developments are addressed. Firstly, reordering of the source text is considered in order to reproduce the order of the words in the target language. Two models are introduced: one linguistically motivated (taking into account POStag or syntax-tree information) and another based only on statistical characterization of the source-target alignment. Both approaches provide the decoder with multiple reordering options; thus, the reordering actually carried out is decided using all the information available to the SMT system. Secondly, a discriminative framework for bilingual word alignment is introduced. This new approach aims to optimize the word alignment so that the performance of the SMT system is optimized. Experimental results of both developments will be discussed.

25th June 2007

Thomas Hain (Sheffield)

The AMI Meeting Transcription System

In this talk the AMI system for automatic transcription of speech recorded in meetings is presented. Meetings are the subject of current investigation in several large research projects where means for facilitation of effective communication by provision of automatic supportive tools are developed. In the AMI/AMIDA project these tools analyse audio-visual sensor input to provide records and effective access for revision as well as aids for remote participation. Naturally automatic speech recognition is an essential component for this analysis. Even though speech recorded in meetings is conversational by nature several aspects lead to poorer performance than that achieved for example on conversational telephone speech. The reasons are found in a more complex acoustic scenario, for example when using recordings from microphones located on a table in a reverberant room. Concurrent speech, a more complex conversational structure and a wide range of topics further lead to degradation in performance. Components in the AMI system addressing some of these issues will be presented together with an overview of the development strategy and the system architecture of the system submitted in the NIST RT'07s evaluations.