A Dialectal Chinese Speech Recognition Framework

Download: PDF.

“A Dialectal Chinese Speech Recognition Framework” by J. Li, F. Zheng, W. Byrne, and D. Jurafsky. Journal of Computer Science and Technology (Science Press, Beijing, China), no. 1, Jan. 2006, pp. 106-115 (10 pages).

Abstract

A framework for dialectal Chinese speech recognition is proposed and studied, where a relatively small dialectal Chinese (or in other words Chinese influenced by the native dialect) speech corpus and the dialect-related knowledge are adopted to translate a standard Chinese (or Putonghua, abbreviated as PTH) speech recognizer into a dialectal Chinese speech recognizer. There are two kinds of knowledge sources: one is human experts and another is a small dialectal Chinese corpus. This knowledge includes four levels : a phonetics level, lexicon level, language level, and the acoustic decoder level. This paper takes Wu dialectal Chinese (WDC) as an example target language with the goal of deriving an acceptable WDC speech recognizer from an existing PTH speech recognizer. Based on the Initial-Final structure of the Chinese language and a study of how dialectal Chinese speakers speak Putonghua, we proposed to use the knowledge of the context-independent PTH-IF mappings (where IF means either a Chinese Initial or a Chinese Final), the context-independent WDC-IF mappings, and the syllable-dependent WDC-IF mappings obtained from either experts or data, and then to combine these with the surface-form based maximum likelihood linear regression (MLLR) acoustic model adaptation method. To reduce the size of the multi-pronunciation lexicon introduced by the IF mappings which might entail confusion in the lexicon and hence lead to the performance degradation, a Multi-Pronunciation Expansion (MPE) method based on an accumulated uni-gram probability (AUP) was proposed. Compared with the original PTH speech recognizer, the resulted WDC speech recognizer achieved over 10% absolute Character Error Rate (CER) reduction when recognizing WDC with only 0.62% CER increase when recognizing PTH. The proposed framework and methods are intended to work not only for Wu dialectal Chinese but also for other dialectal Chinese languages and even other languages.

Download: PDF.

BibTeX entry:

@article{cdasr_jcst05,
   author = {J. Li and F. Zheng and W. Byrne and D. Jurafsky},
   title = {A Dialectal {C}hinese Speech Recognition Framework},
   journal = {Journal of Computer Science and Technology (Science Press,
	Beijing, China)},
   number = {1},
   pages = {106--115 (10 pages)},
   month = jan,
   year = {2006}
}

Back to Bill Byrne publications.