THE GENERATION AND USE OF REGRESSION CLASS TREES FOR MLLR ADAPTATION

Mark Gales

August 1996

Maximum likelihood linear regression (MLLR) is an adaptation technique suitable for both speaker and environmental model-based adaptation. The models are adapted using a set of linear transformations, estimated in a maximum likelihood fashion from the available adaptation data. As these transformations can capture general relationships between the original model set and the current speaker, or new acoustic environment, they can be effective in adapting all the HMM distributions with limited adaptation data. Two important decisions that must be made are (i) how to cluster components together, such that they all have a similar transformation matrix, and (ii) how many transformation matrices to generate for a given block of adaptation data. This paper addresses both problems. Firstly it describes two optimal clustering techniques, in the sense of maximising the likelihood of the adaptation data. The first assigns each component to one of the regression classes. This may be used to generate standard regression class trees. The second scheme performs a {\em fuzzy} assignment of base class to regression class, so the transformation associated with each component is a linear combination of a set of transformations. Secondly two schemes are examined which address the problem of how to determine the number of regression classes, transforms, for a given amount of adaptation data. Two schemes are examined here. A cross-validation scheme based on the auxiliary function of the adaptation data is described. Another scheme based on the use of iterative MLLR is also detailed. Both these schemes require no a-priori thresholding information. An initial evaluation of the techniques was performed using data from the ARPA 1994 test data. On this task, though ``good'' trees, in terms of the likelihood of the adaptation training data were generated, neither of the optimal clustering schemes yielded gains in recognition performance. The performance of the cross-validation scheme was found to be comparable to an empirically determined threshold scheme. The best performance was achieved using iterative MLLR, which outperformed both fixed classes and threshold based schemes.

(ftp:) gales_tr263.ps.gz (http:) gales_tr263.ps.gz

PDF (automatically generated from original PostScript document - may be badly aliased on screen):

(ftp:) gales_tr263.pdf | (http:) gales_tr263.pdf

If you have difficulty viewing files that end `'.gz'`

,
which are gzip compressed, then you may be able to find
tools to uncompress them at the gzip
web site.

If you have difficulty viewing files that are in PostScript, (ending
`'.ps'`

or `'.ps.gz'`

), then you may be able to
find tools to view them at
the gsview
web site.

We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.