Mandarin Pronunciation Modeling Based on the CASS Corpus

Download: PDF.

“Mandarin Pronunciation Modeling Based on the CASS Corpus” by F. Zheng, Z. Song, P. Fung, and W. Byrne. Journal of Computer Science and Technology (Science Press, Beijing, China), vol. 17, no. 3, May 2002. (16 pages).


The pronunciation variability is an important issue that must be faced with when developing practical automatic spontaneous speech recognition systems. In this paper, the factors that may affect the recognition performance are analyzed, including those specific to the Chinese language. By studying the INITIAL/FINAL (IF) characteristics of Chinese language and developing the Bayesian equation, we propose the concepts of generalized INITIAL/FINAL (GIF) and generalized syllable (GS), the GIF modeling and the IF-GIF modeling, as well as the context-dependent pronunciation weighting, based on a well phonetically transcribed seed database. By using these methods, the Chinese syllable error rate (SER) was reduced by 6.3% and 4.2% compared with the GIF modeling and IF modeling respectively when the language model, such as syllable or word N-gram, is not used. The effectiveness of these methods is also proved when more data without the phonetic transcription is used to refine the acoustic model using the proposed iterative force-alignment based transcribing (IFABT) method, achieving a 5.7% SER reduction.

Download: PDF.

BibTeX entry:

   author = {F. Zheng and Z. Song and P. Fung and W. Byrne},
   title = {Mandarin Pronunciation Modeling Based on the {CASS} Corpus},
   journal = {Journal of Computer Science and Technology (Science Press,
	Beijing, China)},
   volume = {17},
   number = {3},
   month = may,
   year = {2002},
   note = {(16 pages)}

Back to Bill Byrne publications.