Speech and Language Processing - Module 4F11

Module Lecturers

Prof. Phil Woodland pcw@eng.cam.ac.uk and Dr. Bill Byrne bill.byrne@eng.cam.ac.uk

Syllabus

Lecture 1: Overview/Introduction.
Introduction & Applications. Speech production mechanisms, types of speech sound, source-filter model, applications of speech and text processing.
Lecture notes available in [lect1.pdf].

Lectures 2-3: Speech Analysis.
FFT based methods. All-pole filter models, calculation of LP coefficients. LP Spectrum. Cepstral analysis. Front-end analysis for speech recognition (MFCCs).
Lecture notes available in [lect23.pdf]

Lectures 4-5: ASR Introduction and Isolated Word Recognition.
Statistical speech recognition, task complexity. Hidden Markov models. Continuous density HMM parameter estimation, Baum-Welch algorithm, Viterbi algorithm, Gaussian mixture models for HMMs.
Lecture notes available in [lect45.pdf]

Lecture 6: Sub-word Acoustic Models.
Large vocabulary speech recogntion, continuous speech training, limitations of word models, context dependent phones, parameter tying, WSJ performance.
Lecture notes available in [lect6.pdf]

Lecture 7: Language Models.
Perplexity, N-gram language models, discounting, interpolation.
Lecture notes available in [lect7.pdf]

Lecture 8: ASR Search Issues.
Continuous speech recognition. Pruning. Integrating context dependent HMMs and N-gram language models.
Lecture notes available in [lect8.pdf]

Lectures 9-10: Weighted Finite State Transducers for Speech and Language Processing.
Efficient realization of probabilistic models for sequence processing. Transduction, composition, determinization, minimum-cost search. WFSTs in ASR search and other language processing applications.
Lecture notes available in [lect9-10.pdf]

Lecture 11: Introduction to Statistical Machine Translation.
Statistical pattern processing approaches to translation. Automatic evaluation of translation quality.
Lecture notes available in [lect11.pdf]

Lecture 12: SMT - Alignment.
Parallel text as training data. Models of word and phrase alignment in translation. Model estimation procedures.
Lecture notes available in [lect12.pdf]

Lecture 13: SMT - Translation.
Phrase-based translation systems. Implementation via WFSTs.
Lecture notes available in [lect13.pdf]

Lecture 14: Text-to-Speech Synthesis.
Introduction to TTS.
Lecture notes available in [lect14.pdf]

Examples papers

There will be two examples papers and two examples classes for the course. The solutions to the examples papers will be available on-line (after examples classes).

Examples Paper 1 available in [egPaper1.pdf].
Solutions to examples paper 1 available in [egPaper1soln.pdf]

Examples Paper 2 available in [examplepaper2.pdf]
Solutions to examples paper 2 available in [examplepaper2_solns.pdf]

Exam Format

Assessment by 1.5 hour exam: 3 questions from 4.

Course Books

Daniel Jurafsky and James Martin. Speech and Language Processing (Second Edition) , Prentice Hall, 2008

Xuedong Huang, Alex Acero and Hsiao-Wuen Hon, Spoken Language Processing, Prentice Hall, 2001

Paul Taylor. Test-to_speech Synthesis, Cambridge University Press, 2009

top

[ Cambridge University | CUED | Machine Intelligence Lab Home | Home]