Reference #2

R.E.Donovan & P.C.Woodland, (1995). Automatic Speech Synthesiser Parameter Estimation using HMMs. Proc. ICASSP '95, pp. 640-643, Detroit.

ABSTRACT This paper presents a new approach to speech synthesis which uses a set of decision tree state clustered triphone HMMs to automatically segment a single speaker speech database into sub-word units suitable for use in a synthesiser. Parameters are then obtained for each of these sub-word units from the segmented database, enabling a basic synthesis system to be constructed. This automatic generation of synthesis parameters means that the system can easily be retrained on a new speaker, whose voice it then mimics. It also means that a very large number of sub-word units can be used, which enables more precise context modelling than was previously possible.