Q2.3: Finding start and end points of a speech signal

End-point detection algorithms identify sections in an incoming audio signal that contain speech. Accurate end-pointing is a non-trivial task, however, reasonable behaviour can be obtained for inputs which contain only speech surrounded by silence (no other noises). Typical algorithms look at the energy or amplitude of the incoming signal and at the rate of "zero-crossings". A zero-crossing is where the audio signal changes from positive to negative or visa versa. When the energy and zero-crossings are at certain levels, it is reasonable to guess that there is speech. More detailed descriptions are provided in the papers cited below and in the documentation for the following software.

End-point detection software is available from:

Plenty of research papers have been presented on end-pointing. Try the following:

Rabiner LR, Sambur MR, "An Algorithm for Determining the Endpoints of Isolated Utterances", Bell System Technical Journal, Vol 54, No. 2, pp 297-315, 1975.
Drago, P.G. et al. "Digital Dynamic Speech Detectors." IEEE Trans on Communications, Vol 26, No 1, Jan 78, pp. 140-145.
Newman, W.C. "Detecting Speech with an Adapative Neural Network." Electronic Design. 22 March 1990.
Taboada. J et al "Explicit Estimation of Speech Boundaries" IEE Proc. Sci. Meas. Technol., Vol 141, No.3, May 1994, pp 153-159.

Back to Section 2 of the comp.speech FAQ Home Page.
Jump to SpeechLinks, [Q2.1], [Q2.2], [Q2.4], [Q2.5], [Q2.6], [Q2.7], [Q2.8]

Administrivia, Copyright, Submit Information : Last Revision: 14:11 13-May-1997