Q6.2: How is speech recognition performed?

A wide variety of techniques are used to perform speech recognition. There are many types of speech recognition. There are many levels of speech recognition / analysis / understanding.

Typically speech recognition starts with the digital sampling of speech. The next stage is acoustic signal processing. Most techniques include spectral analysis; e.g. LPC analysis (Linear Predictive Coding), MFCC (Mel Frequency Cepstral Coefficients), cochlea modelling and many more.

The next stage is recognition of phonemes, groups of phonemes and words. This stage can be achieved by many processes such as DTW (Dynamic Time Warping), HMM (hidden Markov modelling), NNs (Neural Networks), expert systems and combinations of techniques. HMM-based systems are currently the most commonly used and most successful approach.

Most systems utilise some knowledge of the language to aid the recognition process.

Some systems try to "understand" speech. That is, they try to convert the words into a representation of what the speaker intended to mean or achieve by what they said.

Back to Section 6 of the comp.speech FAQ Home Page.
Jump to SpeechLinks, [Q6.1], [Q6.3], [Q6.4], [Q6.5], [Q6.6], [Q6.7]

Administrivia, Copyright, Submit Information : Last Revision: 01:03 16-Apr-1997