Search Contact information
University of Cambridge Home Department of Engineering
University of Cambridge > Engineering Department

Adaptation of an Expressive Single Speaker Deep Neural Network Speech Synthesis System

The tables below contain samples of synthetic speech for two different adaptation speakers, one male and one female, for two different sentences. The sentences are synthesised in a neutral tone as well as five difference expressions, namely, anger, fear, happiness, sadness and tenderness.

The three systems, denoted by A, B and C refer to different techniques that have been employed for speaker adaptation and expression modelling.

System A models the different expressions as multiple outputs extending from the neutral output of the DNN and adapts to a new speaker using Learning Hidden Unit Contributions LHUC.

System B models the different expressions using an expression tag at the input of the network and adapts to a new speaker by adding and training a subset of neurons for the novel speaker.

System C models the different expressions by denoting a subset of the neurons in each layer as pertaining to a specific expression and adapts to a new speaker by adding and training a subset of neurons for the novel speaker.

Please see the full paper for further details.


Sentence 1 - Male
SystemNeutralAngerFearHappinessSadnessTenderness
A
B
C

Sentence 1 - Female
SystemNeutralAngerFearHappinessSadnessTenderness
A
B
C

Sentence 2 - Male
SystemNeutralAngerFearHappinessSadnessTenderness
A
B
C

Sentence 2 - Female
SystemNeutralAngerFearHappinessSadnessTenderness
A
B
C

Contact Information
email:Jonathan Parker (jwp37)
address: Department of Engineering. University of Cambridge, Trumpington Street, Cambridge CB2 1PZ