[Univ of Cambridge] [Dept of Engineering]

Speech and Language Processing - Module 4F11


Module Lecturers

Prof. Phil Woodland pcw@eng.cam.ac.uk and Dr. Bill Byrne bill.byrne@eng.cam.ac.uk

Syllabus

  • Lecture 1: Overview/Introduction.
    Introduction & Applications. Speech production mechanisms, types of speech sound, source-filter model, applications of speech and text processing.
    Lecture notes available in [lect1.pdf].
  • Lectures 2-3: Speech Analysis.
    FFT based methods. All-pole filter models, calculation of LP coefficients. LP Spectrum. Cepstral analysis. Front-end analysis for speech recognition (MFCCs).
    Lecture notes available in [lect23.pdf]
  • Lectures 4-5: ASR Introduction and Isolated Word Recognition.
    Statistical speech recognition, task complexity. Hidden Markov models. Continuous density HMM parameter estimation, Baum-Welch algorithm, Viterbi algorithm, Gaussian mixture models for HMMs.
    Lecture notes available in [lect45.pdf]
  • Lecture 6: Sub-word Acoustic Models.
    Large vocabulary speech recogntion, continuous speech training, limitations of word models, context dependent phones, parameter tying, WSJ performance.
    Lecture notes available in [lect6.pdf]
  • Lecture 7: Language Models.
    Perplexity, N-gram language models, discounting, interpolation.
    Lecture notes available in [lect7.pdf]
  • Lecture 8: ASR Search Issues.
    Continuous speech recognition. Pruning. Integrating context dependent HMMs and N-gram language models.
    Lecture notes available in [lect8.pdf]
  • Lectures 9-10: Weighted Finite State Transducers for Speech and Language Processing.
    Efficient realization of probabilistic models for sequence processing. Transduction, composition, determinization, minimum-cost search. WFSTs in ASR search and other language processing applications.
    Lecture notes available in [lect9-10.pdf]
  • Lecture 11: Introduction to Statistical Machine Translation.
    Statistical pattern processing approaches to translation. Automatic evaluation of translation quality.
    Lecture notes available in [lect11.pdf]
  • Lecture 12: SMT - Alignment.
    Parallel text as training data. Models of word and phrase alignment in translation. Model estimation procedures.
    Lecture notes available in [lect12.pdf]
  • Lecture 13: SMT - Translation.
    Phrase-based translation systems. Implementation via WFSTs.
    Lecture notes available in [lect13.pdf]
  • Lecture 14: Text-to-Speech Synthesis.
    Introduction to TTS.
    Lecture notes available in [lect14.pdf]
  • Examples papers

    There will be two examples papers and two examples classes for the course. The solutions to the examples papers will be available on-line (after examples classes).

    Examples Paper 1 available in [egPaper1.pdf].
    Solutions to examples paper 1 available in [egPaper1soln.pdf]

    Examples Paper 2 available in [examplepaper2.pdf]
    Solutions to examples paper 2 available in [examplepaper2_solns.pdf]

    Exam Format

    Assessment by 1.5 hour exam: 3 questions from 4.

    Course Books

  • Daniel Jurafsky and James Martin. Speech and Language Processing (Second Edition) , Prentice Hall, 2008
  • Xuedong Huang, Alex Acero and Hsiao-Wuen Hon, Spoken Language Processing, Prentice Hall, 2001
  • Paul Taylor. Test-to_speech Synthesis, Cambridge University Press, 2009
  • top


    [ Cambridge University | CUED | Machine Intelligence Lab Home | Home]