Speech Reading Club
This page contains information about the Speech Reading Club to be run
in Lent term 2005.
Any queries or problems please contact me by email
mjfg@eng.cam.ac.uk
The speech reading club is only available to part-time students in their
first year of study. It is not an option for part-time students in their
second year.
If there is a theme, or topic within a
theme, that you are very interested in that is not listed contacted me. We can discuss including it.
Aims
The aim of the speech reading club is to investigate a specific area
mentioned in the Speech Processing modules, review the state of the art and
understand how it might be applied in a modern speech processing task.
Students are
expected to look at a particular theme, and topic within that theme, and
be able to present a concise summary of the area, along with a detailed
essay on their particular topic.
top
Structure
The broad structure of the module is described below. The details may vary depending
on the number of students that select the module.
-
By 18th November 2004 students will be asked to make a decision whether they are
taking the Speech Reading Club module. If the module is chosen a preference
for a particular theme/topic may be expressed. The maximum number of students
allowed to take the Speech Reading Club is 60% of the CSTIT course.
-
Students will be divided into groups of 3-5. Associated with each
group there is a particular theme. Each student in the group will also
be assigned a particular topic within that theme. Where possible the
assignment of themes and topics will reflect any preferences
expressed. However, not all themes or topics will be run.
-
Papers associated with each theme/topic will be distributed, or made
available on the web, at the start of Lent term. For each theme a 1
hour discussion session with the theme organiser will be organised
during weeks 3 and 4 of Lent term. Specific problems may also be
discussed via email.
-
During weeks 5 to 8 of Lent term a one to two hour presentations of each
theme will be given. These will consist of a 5-10 minute overview of the
theme, jointly prepared by all members of the group. Each student will
then give a 15 minute presentation of their particular topic. Each
presentation is followed by a short discussion about topic lead by the
student and theme organiser.
ALL students taking the Speech Reading Club module must attend ALL presentations.
-
Each student writes an essay that describes the general issues in the
their theme area and a detailed discussion about their particular
topic. The maximum word length for the essays is 5000 words. The essay
must be handed in by 26th April 2005.
Important dates are:
-
18th November 2004: decision on whether to do the speech
reading club module.
-
25th November 2004: list of topic and theme assignments
circulated.
-
Weeks 5-8 Lent term: speech reading club presentations.
-
26th April 2005: hand-in date for the speech reading club essays.
top
Themes
Here is a preliminary list of themes. If more details are required for
any of the themes or topics contact me, or the theme
organiser. If you are interested in a theme (area of speech processing/research) not mentioned below,
suggestions are welcome.
Not all themes will be run. The exact choice will depend on number of
students and preferences.
top
Assessment
The assessment for the Speech Reading Club is by essay. The essay
should give an overview of the theme and a detailed discussion of the
particular topic within that theme. The maximum length of the essay is
5000 words. The submission date for the essays is April 26th
2005. Late submission will be penalised in the same fashion as late
submission of practical work.
top
Themes and Topics References
The following references are quite extensive. If you are interested in
getting a brief overview of the theme look in the reference marked
with a (*), or look in the associated chapter of
-
X. Huang, A. Acero and H-W Hon, Spoken Language Processing, Prentice
Hall.
Note: not all themes, or topics within a theme, will be run. The topics
may vary slightly depending on the number of people who select a
theme.
Acoustic Modelling
The references given for each of the themes and topics should be considered
as starting points for further investigation. Students are expected to
look at additional papers. If further help is required contact the
theme organiser.
For additional reading the last few years proceedings from
ICASSP,Eurospeech and ICSLP
are available.
Please do not print out copies of the longer papers - contact me first.
top
SPEECH SYNTHESIS
(Paul Taylor):
Topics:
- Acoustics of Speech Production
- basics of how sound wvaes are produced and how they travel
- source/filter model of speech production
- models of vocal tracts as tubes
- sound waves in tubes
- sound sources, formants, linear prediction
- Prosodic Modelling
- basic prosody models
- automatic recognition of prosody
- synthesis of prosody
- using prosodic information in speech recogntion
References:
- Tobi home page
- Taylor, Paul (book chapter), Acoustic Theory of Speech
Production
-
Black AW, Hunt A
Generating f0 contours from ToBI labels using linear regression
-
P. Taylor.
Analysis and synthesis of intonation
using the tilt model. Journal of the Acoustical
Society of America, 107(3):1697-1714, 2000.
-
Ann Syrdal, Gregor Moehler, Kurt Dusterhoff, Alistair
Conkie, and Alan
Intonation modeling. In 3rd ESCA Workshop on Speech Synthesis,
pages 305-310, Jenolan Caves, 1998.
-
Comparing CART and Fujisaki Intonation Models for Synthesis
-
A Method for Automatic Extraction of Fujisaki-Model Parameters
-
Elizabeth Shriberg et al.
Can prosody aid the automatic classification of dialog acts in
conversational speech? Language and Speech, 41(3-4),
1998.
R.
-
MIT lecture notes
-
Huang et al. Spoken Language Processing, Chapter 6
-
D. Ellis
(lecture notes)
-
P. Taylor
Acoustic Theory of Speech Production (book chapter)
-
P. Taylor
Introduction to Signals and Filters (book chapter)
top
DISCRIMINATIVE TRAINING
(Mark Gales and
Bill Byrne):
Topics:
-
Maximum mutual information (MMI) and frame discrimination (FD) training criteria;
-
Minimum classification error rate training criterion;
-
Discriminative training for large vocabulary systems;
-
Discriminative methods for speaker adaptation and feature extraction.
References:
top
SPEAKER ADAPTATION (Mark Gales):
Topics:
-
Speaker clustering, eigenvoices and cluster adaptive training;
-
Linear model-based adaptation schemes and adaptive training, maximum likelihood linear
regression (MLLR) and constrained MLLR;
-
Maximum a-posteriori (MAP) adaptation schemes and extensions, e.g. adaptation
by correlation.
References:
top
NOISE ROBUSTNESS (Mark Gales):
Topics:
-
Enhancement schemes (including model-based enhancement);
-
Predictive and adaptive model-based compensation schemes;
-
Inherently robust frontends and models.
References:
top
STATISTICAL MACHINE TRANSLATION (Bill Byrne):
Topics:
- Bitext Alignment
- Word Alignment in Bitext
- Models and Algorithms for Statistical Translation
References:
-
W. Gale and K. Church
A Program for Aligning Sentences in Bilingual Corpora
Association for Computational Linguistics. 1991
-
R. Moore
Fast and Accurate Sentence Alignment of Bilingual Corpora
Proceedings, 5th Conference of the Association for Machine Translation in the Americas,
Tiburon, California, Springer-Verlag, Heidelberg, Germany,
pp. 135-244.
-
P. Brown and S. Della Pietra and V. Della Pietra and R. Mercer
The Mathematics of Statistical Machine Translation: Parameter Estimation
Association for Computational Linguistics. 1993
-
F. Och and H. Ney
A Systematic Comparison of Various Statistical Alignment Models
Association for Computational Linguistics, 2003
-
K. Knight and Y. Al-Onaizan
Translation with Finite-State Devices
Proceedings of the 4th AMTA Conference, 1998.
-
F. Och and H. Ney
The Alignment Template Approach to Statistical Machine Translation
Computational Linguistics, December 2004, pp 417--450
-
S. Kumar, Y. Deng, and W. Byrne
A weighted finite state transducer translation template model for
statistical machine translation
Journal of Natural Language Engineering. To appear.
top
BAYESIAN NETWORKS AND SEGMENT MODELS (Mark Gales):
Topics:
-
Dynamic Bayesian networks and graphical models for speech recognition;
-
Distributed representations for speech recognition;
-
Linear dynamical and factor analysed systems;
-
Efficient covariance modelling.
References:
top
[ Cambridge University |
CUED |
SVR Group |
Home]
|