Automatic Transcription of Czech, Russian, and Slovak Spontaneous Speech in the MALACH Project

Download: PDF.

“Automatic Transcription of Czech, Russian, and Slovak Spontaneous Speech in the MALACH Project” by J. Psutka, P. Ircing, J.V. Psutka, J. Hajic, W. Byrne, and J. Mirovski. In Proceedings of EUROSPEECH, 2005.

Abstract

This paper describes the 3.5-years effort put into building LVCSR systems for recognition of spontaneous speech of Czech, Russian, and Slovak witnesses of the Holocaust in the MALACH project. For processing of colloquial, highly emotional and heavily accented speech of elderly people containing many non-speech events we have developed techniques that very effectively handle both non-speech events and colloquial and accented variants of uttered words. Manual transcripts as one of the main sources for language modeling were automatically ãnormalizedÓ using standardized lexicon, which brought about 2 to 3% reduction of the word error rate (WER). The subsequent interpolation of such LMs with models built from an additional collection (consisting of topically selected sentences from general text corpora) resulted into an additional improvement of performance of up to 3% .

Download: PDF.

BibTeX entry:

@inproceedings{eurosp05malachCRS,
   author = {J. Psutka and P. Ircing and J.V. Psutka and J. Hajic and W.
	Byrne and J. Mirovski},
   title = {Automatic Transcription of {C}zech, {R}ussian, and {S}lovak
	Spontaneous Speech in the {MALACH} Project},
   booktitle = {Proceedings of EUROSPEECH},
   pages = {(4 pages)},
   year = {2005}
}

Back to Bill Byrne publications.