[Univ of Cambridge] UK Speech Conference 2017
Invited Speakers

Christine Evers : "Acoustic Scene Mapping for Robot Audition"

[Christine Evers] Abstract:
Recent advances in robotics and autonomous systems are rapidly leading to the evolution of machines that assist humans across the industrial, healthcare, and social sectors. For intuitive interaction between humans and machines, spoken language is a fundamental prerequisite. However, in realistic environments, speech signals are typically distorted by reverberation, noise, and interference from competing sound sources. Acoustic signal processing is therefore necessary in order to provide machines with the ability to learn, adapt and react to stimuli in the acoustic environment. The processed, anechoic speech signals are naturally time varying due to fluctuations of air flow in the vocal tract. Furthermore, motion of a human talker s head and body lead to spatio-temporal variations in the source positions and orientation, and hence time-varying source-sensor geometries. Therefore, in order to listen in realistic, dynamic multi-talker environments, robots need to be equipped with signal processing algorithms that recognize and exploit constructively the spatial, spectral, and temporal variations in the recorded signals. Bayesian inference provides a principled framework for the incorporation of temporal models capturing prior knowledge of physical quantities, such as the acoustic channel. This talk therefore explores the theory and application of Bayesian learning for robot audition, addressing novel advances in acoustic Simultaneous Localization and Mapping (aSLAM), sound source localization and tracking.

Christine Evers is an EPSRC Fellow at Imperial College London. She received her PhD from the University of Edinburgh, UK, in 2010, after having completed her MSc degree in Signal Processing and Communications at the University of Edinburgh in 2006, and BSc degree in Electrical Engineering and Computer Science at Jacobs University Bremen, Germany in 2005. After a position as a research fellow at the University of Edinburgh between 2009 and 2010, she worked until 2014 as a senior systems engineer on RADAR tracking systems at Selex ES, Edinburgh, UK. She returned to academia in 2014 as a research associate in the Department of Electrical and Electronic Engineering at Imperial College London, focusing on acoustic scene mapping for robot audition. As of 2017, she is awarded a fellowship by the UK Engineering and Physical Sciences Research Council (EPSRC) to advance her research on acoustic signal processing and scene mapping for socially assistive robots. Her research focuses on Bayesian inference for speech and audio applications in dynamic environments, including acoustic simultaneous localization and mapping, sound source localization and tracking, blind speech dereverberation, and sensor fusion. She is an IEEE Senior Member and a member of the IEEE Signal Processing Society Technical Committee on Audio and Acoustic Signal Processing.

Spyros Matsoukas: Spoken Language Understanding in Alexa

[Spyros Matsoukas] Abstract:
We will give an overview of Alexa's spoken language understanding components including wake-word detection, speech recognition, intent and named entity recognition, dialog management, and text-to-speech synthesis, and present a set of speech recognition and natural language understanding techniques we have developed as part of our continued efforts to enhance Alexa's conversational capabilities.

Spyros Matsoukas is a Senior Principal Scientist in the Alexa Machine Learning organization at Amazon.com, developing spoken language understanding technology for voice-enabled products such as Amazon Echo. From 1998 to 2013 he worked at BBN Technologies, Cambridge MA, conducting research in acoustic modeling for ASR, speaker diarization, statistical machine translation, speaker identification, and language identification. He has over 60 publications in peer reviewed conferences and journals, with 3 best paper awards.

Vincent Wan : "Tutorial on Sequence Models using TensorFlow"

[Vincent Wan] Abstract:
In this tutorial I'll be giving an in-depth introduction to the higher level TensorFlow APIs, including code snippets. I'll start from building simple graphs and running them. I'll cover feed forward neural networks, convolutional neural networks and finally show how to build a sequence to sequence model in TensorFlow. A basic knowledge of Python is assumed.

Vincent Wan is a Research Scientist at Google, London. He received his PhD from Sheffield University in 2002 on Speaker Verification using Support Vector Mashines, after having completed a BA in Phyics at Oxford University. From 2002-2009 he was a Research Associate in the Speech and Hearing Group (SPandH), Sheffield University, doing R&D into speech recognition systems for business style meetings. From 2009-2014 he worked on speech recognition and speech synthesis for Toshiba Research Europe Ltd, Cambridge Research Lab, where he was one of the creators of a photo-realistic avatar that is entirely text driven and can express emotions. Since 2014 he has been part of the Google text-to-speech synthesis team.

main page