MIL Speech Seminars 2006-2007

The MIL Speech Seminar series schedule for Lent Term 2007 is as follows:

16th January 2007	Steve Renals (Edinburgh)	Interpreting Multimodal Communication Scenes	The AMI project is about interpreting human communication using audio, video and other signals, in the context of multiparty meetings. This is an interdisciplinary problem involving signal processing and machine learning (how to make sense of multimodal communication scenes), understanding the content of the recognized signals, and developing useful applications. In this talk I shall discuss some of the work we have been doing to automatically annotated and interpret multimodal meeting recordings. Specific issues that I'll discuss include the use of multistream statistical models to segment meetings at different levels (dialogue acts and ``meeting actions'') and approaches to meeting summarization.
5th February 2007	Miles Osborne (Edinburgh)	Randomised Language Modelling for Statistical Machine Translation	As is well known, translation performance improves as more and more monolingual data is used to build the associated language models. However, time and space considerations mean that typically, researchers are forced to use (smoothed) trigram models. Instead of using a cluster of machines, per-sentence filtering or other slow techniques, we instead look at using a randomised representation of the ngram set as a means to building higher order ngram models. The resulting space savings are dramatic: we can represent a 5-gram model in around 100 M of space and achieve similar, if not better translation results than when using traditional language models. This opens the way for using language models trained on trillions of words. Work in progress and jointly with David Talbot.
26th February 2007	Zeynep Inanoglu (MIL PhD)	Transforming the Emotion In Speech: Conversion Techniques For Expressive Speech Synthesis	Adding emotions to synthesized speech has become a high priority since the emergence of concatenative synthesis methods which generate highly intelligible and natural outputs. In this project, we explore data-driven prosody and voice quality modification techniques that enable a neutral source emotion to be converted to some required target emotion without changing the message, meaning or speaker identity. Adding an emotion to neutral speech requires both voice quality and prosody to be modified. For the former, two alternative methods of transforming the short term spectra will be presented: a GMM-based linear transformation method and a codebook-based selection approach. For prosody, phoneme durations are transformed using a context-dependent relative decision tree based on source and target durations. Intonation is modifed using an HMM-based modelling and generation technique. All three components of the system are applied to neutral test data and evaluated using informal listening tests and an independent emotion classifier.
12th March 2007	Jost Schatzman (MIL PhD)	TBC	TBC