Abstract for kim_icslp04

Proc. ICSLP 2004, Jeju, South Korea

USING VTLN FOR BROADCAST NEWS TRANSCRIPTION

D.Y. Kim, S. Umesh, M.J.F. Gales, T. Hain and P.C. Woodland

October 2004

Vocal tract length normalisation (VTLN) is a commonly used speaker normalisation approach. It is attractive compared to many normalisation schemes as it is typically dependent on only a single parameter, allowing the {\em warp factors} to be robustly calculated on little data. However, the scheme normally requires explicitly coding the data at multiple warp factors. Furthermore, it is only possible to approximate the {\em Jacobian} associated with the VTLN transformation. A new, simple, linear approximation to VTLN is described in this paper. This linear approximation allows the {\em Jacobian} to be exactly computed. It can also be highly efficient in terms of warp factor estimation and application of the warp factors. Both the linear and standard CUED VTLN schemes were evaluated in the 2003 BNE evaluation framework and found to yield similar performance. When used in system combination both VTLN schemes yielded slight gains over the baseline system.


| (ftp:) kim_icslp04.pdf | (http:) kim_icslp04.pdf | (ftp:) kim_icslp04.ps.gz | (http:) kim_icslp04.ps.gz |

If you have difficulty viewing files that end '.gz', which are gzip compressed, then you may be able to find tools to uncompress them at the gzip web site.

If you have difficulty viewing files that are in PostScript, (ending '.ps' or '.ps.gz'), then you may be able to find tools to view them at the gsview web site.

We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.