next up previous contents
Next: Homomorphic filtering Up: Short-Term Fourier Analysis  Previous: Example: Spectral subtraction

Cepstral analysis

The source filter model of speech production decomposes the speech signal, tex2html_wrap_inline2787, into an excitation, tex2html_wrap_inline2963, and a linear filter, tex2html_wrap_inline2889. In the frequency domain:

We wish tex2html_wrap_inline2889 to represent the envelope of the speech power spectra and tex2html_wrap_inline2969 to represent the fine detail of the excitation. For example, see figure 22 and figure 23. With a suitable definition of the log of a complex number (tex2html_wrap_inline2971) this may be achieved with:

For most speech processing applications we require only the amplitude spectra, hence the equation is written:
The slowly varying components of tex2html_wrap_inline2973 are represented by the low frequencies and the fine detail by the high frequencies. Hence another Fourier transform is the natural way to separate the components of tex2html_wrap_inline2889 and tex2html_wrap_inline2969. This produces the cepstral analysis, shown diagrammatically in figure 28.

Figure 28: Cepstral analysis

For the example speech of figures 21,22,23 the resulting (real) cepstral analysis is shown in figure 29.

Figure 29: The full real cepstrum. Calculated using Matlab: ifft(log(abs(fft(hamming(512) .* sig))))

It can be seen that most of the detail occurs near the origin and in peaks higher up the cepstrum. Thus the lower numbered coefficients provide the envelope information. The remainder of the detail is mostly contained in the peaks which are separated by the pitch period (in this case about 70 sample) and provide the fine detail pitch information.

An enlargement of the few samples is shown in figure 30

Figure 30: The first 20 cepstral coefficients

Speech Vision Robotics group/Tony Robinson