USING VTLN FOR BROADCAST NEWS TRANSCRIPTION

Vocal tract length normalisation (VTLN) is a commonly used speaker normalisation approach. It is attractive compared to many normalisation schemes as it is typically dependent on only a single parameter, allowing the {\em warp factors} to be robustly calculated on little data. However, the scheme normally requires explicitly coding the data at multiple warp factors. Furthermore, it is only possible to approximate the {\em Jacobian} associated with the VTLN transformation. A new, simple, linear approximation to VTLN is described in this paper. This linear approximation allows the {\em Jacobian} to be exactly computed. It can also be highly efficient in terms of warp factor estimation and application of the warp factors. Both the linear and standard CUED VTLN schemes were evaluated in the 2003 BNE evaluation framework and found to yield similar performance. When used in system combination both VTLN schemes yielded slight gains over the baseline system.

