Building LVCSR Systems for transcription of spontaneously produced Russian witnesses in the MALACH project: initial steps and first results

“Building LVCSR Systems for transcription of spontaneously produced Russian witnesses in the MALACH project: initial steps and first results” by J. Psutka, I. Iljuchin, P. Ircing, J.V. Psutka, V. Trejbal, W. Byrne, J. Hajic, and S. Gustman. In Proceedings of the Text, Speech, and Dialog Workshop, 2003, pp. 214-219 (6 pages).

Abstract

The MALACH project uses the world's largest digital archive of video oral histories collected by the Survivors of the Shoah Visual History Foundation (VHF) and attempts to access such archives by advancing the state-of-the-art in Automatic Speech Recognition and Information Retrieval. This paper discusses the intial steps and first results in building large vocabulary continuous speech recognition (LVCSR) systems for the transcription of Russian witnesses. As the third language processed in the MALACH project (following English and Czech), Russian has posed new ASR challenges, especially in phonetic modeling. Although most of the Russian testimonies were provided by native Russian survivors, the speakers come from many different regions and countries resulting in a diverse collection of accented spontaneous Russian speech.

BibTeX entry:

@inproceedings{tsd03_ruasr,
   author = {J. Psutka and I. Iljuchin and P. Ircing and J.V. Psutka and
	V. Trejbal and W. Byrne and J. Hajic and S. Gustman},
   title = {Building {LVCSR} Systems for transcription of spontaneously
	produced {R}ussian witnesses in the {MALACH} project: initial
	steps and first results},
   booktitle = {Proceedings of the Text, Speech, and Dialog Workshop},
   pages = {214-219 (6 pages)},
   year = {2003}
}

Back to Bill Byrne publications.