Machine Intelligence Laboratory

Cambridge University Department of Engineering

Keiichi Tokuda — 6 February 2015

Human-like singing and talking machines: flexible speech synthesis in karaoke, anime, smart phones, video games, digital signage, TV and radio programs, etc.


This talk will give an overview of statistical approach to flexible speech synthesis. For constructing human-like talking machines, speech synthesis systems are required to have an ability to generate speech with arbitrary speaker’s voice, various speaking styles in different languages, varying emphasis and focus, and/or emotional expressions. The main advantage of the statistical approach is that such flexibility can easily be realized using mathematically well-defined algorithms. In this talk, the system architecture is outlined and then recent results and demos will be presented.


Keiichi Tokuda is a Professor in the Department of Computer Science at Nagoya Institute of Technology and currently he is visiting Google as sabbatical. He is also an Honorary Professor at the University of Edinburgh. He was an Invited Researcher at the National Institute of Information and Communications Technology (NICT), formally known as the ATR Spoken Language Communication Research Laboratories, Kyoto, Japan from 2000 to 2013, and was a Visiting Researcher at Carnegie Mellon University from 2001 to 2002. He has been working on statistical parametric speech synthesis after he proposed an algorithm for speech parameter generation from HMM in 1995. He received six paper awards and two achievement awards. He is an IEEE Fellow and an ISCA Fellow.