The MIL Speech Seminar series schedule for Lent Term 2007 is as follows:
16th January 2007 | Steve Renals (Edinburgh) | Interpreting Multimodal Communication Scenes | The AMI project is about interpreting human communication using
audio, video and other signals, in the context of multiparty
meetings. This is an interdisciplinary problem involving signal
processing and machine learning (how to make sense of multimodal
communication scenes), understanding the content of the recognized
signals, and developing useful applications.
In this talk I shall discuss some of the work we have been doing to
automatically annotated and interpret multimodal meeting recordings.
Specific issues that I'll discuss include the use of multistream
statistical models to segment meetings at different levels (dialogue
acts and ``meeting actions'') and approaches to meeting
summarization. |
5th February 2007 | Miles Osborne (Edinburgh) | Randomised Language Modelling for Statistical Machine Translation |
As is well known, translation performance improves as more and more
monolingual data is used to build the associated language models. However,
time and space considerations mean that typically, researchers are forced to
use (smoothed) trigram models. Instead of using a cluster of machines,
per-sentence filtering or other slow techniques, we instead look at using a
randomised representation of the ngram set as a means to building higher
order ngram models. The resulting space savings are dramatic: we can
represent a 5-gram model in around 100 M of space and achieve similar, if
not better translation results than when using traditional language models.
This opens the way for using language models trained on trillions of words.
Work in progress and jointly with David Talbot.
|
26th February 2007 | Zeynep Inanoglu (MIL PhD) | Transforming the Emotion In Speech: Conversion Techniques For Expressive Speech Synthesis |
Adding emotions to synthesized speech has become a high priority since the
emergence of concatenative synthesis methods which generate highly
intelligible and natural outputs. In this project, we explore data-driven
prosody and voice quality modification techniques that enable a neutral
source emotion to be converted to some required target emotion without
changing the message, meaning or speaker identity.
Adding an emotion to neutral speech requires both voice quality and prosody
to be modified. For the former, two alternative methods of transforming the
short term spectra will be presented: a GMM-based linear transformation
method and a codebook-based selection approach. For prosody, phoneme
durations are transformed using a context-dependent relative decision tree
based on source and target durations. Intonation is modifed using an
HMM-based modelling and generation technique. All three components of the
system are applied to neutral test data and evaluated using informal
listening tests and an independent emotion classifier.
|
12th March 2007 | Jost Schatzman (MIL PhD) | TBC |
TBC
|