Automatic Transcription of Czech Language Oral History in the MALACH Project: Resources and Initial Experiments

Download: PDF.

“Automatic Transcription of Czech Language Oral History in the MALACH Project: Resources and Initial Experiments” by J. Psutka, P. Ircing, J. Psutka, V. Radova, W. Byrne, J. Hajic, S. Gustman, and B. Ramabhadran. In Proceedings of the Text, Speech, and Dialog Workshop, 2002.

Abstract

In this paper we describe the initial stages of the ASR component of the MALACH project. This project will attempt to provide improved access to the large multilingual spoken archives collected by the Survivors of the Shoah Visual History Foundation by advancing the state of the art in automated speech recognition. In order to train the ASR system, it is necessary to manually transcribe a large amount of speech data, identify the appropriate vocabulary, and obtain relevant text for language modeling. We give a detailed description of the speech annotation process; show the specific properties of the spontaneous speech contained in the archives; and present baseline speech recognition results.

Download: PDF.

BibTeX entry:

@inproceedings{czasr_tsd02,
   author = {J. Psutka and P. Ircing and J. Psutka and V. Radova and W.
	Byrne and J. Hajic and S. Gustman and B. Ramabhadran},
   title = {Automatic Transcription of {C}zech Language Oral History in
	the {MALACH} Project: Resources and Initial Experiments},
   booktitle = {Proceedings of the Text, Speech, and Dialog Workshop},
   pages = {(8 pages)},
   year = {2002}
}

Back to Bill Byrne publications.