MIL Speech Seminars 2003-2004

The MIL seminar series schedule for the Easter Term 2003 was as follows:

27th May 2003	Nobuaki Minematsu (University of Tokyo)	Phonetic Tree Analysis	This work paper proposes two novel techniques designed to analyze the segmental intelligibility of accented pronunciation, both of which are based upon completely new and unique methods of observing and characterizing the accented pronunciation. The first technique, Phonetic Tree Analysis, extracts phonetic tree structure embedded in utterances of a student. Results of analyzing Japanese English visually and clearly present well-known Japanese habits in speaking English. The second technique automatically estimates the segmental intelligibility not based upon acoustic matching with native speakers' utterances but based upon matching between two structures, the extracted phonetic structure in the student's pronunciation and the lexical structure in the target language's vocabulary. The estimation is done using one of word perception models, Cohort Model, and the estimated cohort size is interpreted as degree of the segmental unintelligibility. Experimental results show good accordance between the estimated intelligibility and the segmental proficiency rated by teachers.
4th June 2003	Dharmendra Kanejiya (Indian Institute of Technology, Delhi)	Latent analysis of syntactic-semantic information	Syntax and semantics are two of the important aspects of a natural language. For many applications of natural language understanding and modeling, it would be useful to combine these two levels of information in a robust manner. In this talk, I will present syntactically enhanced LSA (SELSA), our approach to joint syntactic-semantic analysis for speech and language processing. SELSA generalizes the concept of latent semantic analysis (LSA) by incorporating various levels of syntactic information. I will present two applications of SELSA :(1) natural language understanding and cognitive modeling for automatic evaluation of students' answers in an intelligent tutoring system (2) statistical language modeling using large-span syntactic-semantic information for speech recognition.
17th June 2003	Andrew Liu (MIL)	Automatic Complexity Control for LVCSR Systems	Designing a large vocabulary speech recognition system is a highly complex problem. Many techniques affect both the recognition performance and system complexity. Automatic complexity control criteria are needed to quickly predict recognition performance ranking of systems with various complexity, in order to select an optimal model structure with the minimum word error. This seminar will initially give an overview of existing complexity control criteria while focusing on penalized likelihood based techniques within the Bayesian learning framework. Various approximation schemes for Bayesian evidence integral will be discussed, including Bayesian Information Criterion (BIC), Laplace approximation and variational approximation. Limitations of these penalized likelihood based criteria will be discussed, and marginalization of discriminative training criteria in the parametric space will be proposed as a new approach. The marginalization of these discriminative training criteria can be approximated via the marginalization of a lower bound of the corresponding criterion. This lower bound can be derived using generalized EM algorithm, and related to standrd discriminative training auxiliary functions from which Extended Baum-Welch re-estimation formula can be derived. Detailed discussion will then be given on marginalizing specific criteria like Maximum Mutual Information (MMI) and Minimum Word Error (MWE). Initial experimental results on a typical LVCSR task will be presented, with the number of Gaussian components per state and retained subspace dimensionality of an HLDA system to be optimized on both global and local level. Marginalization of MMI criterion are shown to give the lowest recognition performance ranking prediction error with minimum computational cost.