CONTEXT-DEPENDENT CLASSES IN A HYBRID RECURRENT NETWORK-HMM SPEECH RECOGNITION SYSTEM
Dan Kershaw, Mike Hochberg & Tony Robinson
A modular method for incorporating context-dependent phone classes in the CUED connection-ist-HMM hybrid speech recognition system is introduced. The current CUED connectionist-HMM hybrid system performs well on large vocabulary speech recognition tasks. Although the recurrent framework does model acoustic context internally (mainly in the hidden state vector), the targets are currently context independent. It is proposed that by including phonetic-context dependent targets to the recurrent network, improved modelling would be possible, as is seen in equivalent monophone and triphone HMM systems.
This report discusses the methods necessary to introduce context-dependent outputs into the hybrid system. It focusses on two main issues: Which context classes should be modelled and which would be best for the recurrent framework, and given a set of context classes which mechanism should be employed to model them. A decision-tree based approach was used to cluster the different context classes of a phone. The final training strategy involved a modular solution, whereby single-layer networks were trained on the state-vector to discriminate between the different context classes, given the phone class.
Some initial experiments show an average reduction of around 16\% in word error rate on some ARPA Wall Street Journal tasks. The new context-dependent system still has far fewer parameters than any equivalent HMM system, and due to improved modelling decoding speed is over twice as fast as the context-independent system.
If you have difficulty viewing files that end
which are gzip compressed, then you may be able to find
tools to uncompress them at the gzip
If you have difficulty viewing files that are in PostScript, (ending
'.ps.gz'), then you may be able to
find tools to view them at
We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.