MIL Speech Seminars 2003-2004

The MIL Speech Seminar series schedule for the Long Vacation 2004 was as follows:

August 3rd 2004	Tomoki Toda (Nagoya Institute of Technology)	A Mapping Method Based on Maximum Likelihood with A Gaussian Mixture Model	A statistical feature mapping is a useful technique for many applications, e.g., voice conversion. As a famous mapping method, the conversion algorithm based on a Gaussian Mixture Model (GMM) has been proposed by Stylianou. Although the GMM-based mapping can convert spectra more appropriately than the other methods, e.g., Vector Quantization and Linear Multivariate Regression, the deterioration of speech quality is caused by some problems: e.g., 1) the mapping function is not supported by a proper statistical model, 2) inappropriate spectral movements are caused by frame-based conversion, and 3) the converted spectra are excessively smoothed by the statistical modeling. In order to address these problems, I propose a novel feature mapping method based on Maximum Likelihood with a GMM. In the proposed method, ML-based feature conversion is performed with not only static but also dynamic feature statistics to estimate appropriate spectral movements. Moreover, the over-smoothing effect can be alleviated by introducing a global variance feature of the converted spectra. The effectiveness of the proposed method is demonstrated from results of subjective and objective evaluations. In this talk, I show results of spectral determination from articulatory movements and acoustic-to-articulatory inversion mapping as well as the result of voice conversion.
August 24th 2004	Chandra Sekhar (Dept. of Compter Science and Engineering I.I.T. Madras)	Recognition of Subword Units of Speech using Support Vector Machines	We address two issues in acoustic modeling of subword units of speech, such as Consonant-Vowel(CV) units, using support vector machines (SVMs). The first issue is related to development of a multi-class pattern recognition system using SVMs for large number of CV classes. In conventional approaches to multi-class pattern recognition using SVMs, learning involves discrimination of each class against all the other classes. We propose a close-class-set discrimination method suitable for large-class-set pattern recognition problems. In the proposed method, learning involves discrimination of each class against a subset of classes confusable with it and included in its close-class-set. We consider different criteria such as the description of classes, similarity measure between example patterns of classes, and the margin of pairwise classification SVMs, for identification of close-class-sets. We study the effectiveness of the proposed method in reducing the complexity of multi-class pattern recognition systems based on the one-against-the rest and one-against-one approaches. The second issue addressed in this work is related to classification of varying duration segments of speech using SVMs. Commonly used methods for mapping the varying duration segments into fixed dimension patterns may lead to loss of crucial information necessary for classification. We propose a method in which the representation of a segment of speech is considered as a trajectory in a multidimensional space. A fixed dimension pattern vector derived from the outerproduct operation on the matrix representation of a multidimensional trajectory is given as input to the SVMs. For acoustic modeling of speech segments consisting of multiple phonemes, the outerproduct operation is carried out for the trajectory of each phoneme. The effectiveness of the proposed method is demonstrated in recognition of utterances of CV type units of speech.