next up previous contents
Next: Homomorphic filtering Up: Short-Term Fourier Analysis  Previous: Example: Spectral subtraction

Cepstral analysis

The source filter model of speech production decomposes the speech signal, tex2html_wrap_inline2787, into an excitation, tex2html_wrap_inline2963, and a linear filter, tex2html_wrap_inline2889. In the frequency domain:
eqnarray540

We wish tex2html_wrap_inline2889 to represent the envelope of the speech power spectra and tex2html_wrap_inline2969 to represent the fine detail of the excitation. For example, see figure 22 and figure 23. With a suitable definition of the log of a complex number (tex2html_wrap_inline2971) this may be achieved with:
eqnarray549

For most speech processing applications we require only the amplitude spectra, hence the equation is written:
eqnarray554
The slowly varying components of tex2html_wrap_inline2973 are represented by the low frequencies and the fine detail by the high frequencies. Hence another Fourier transform is the natural way to separate the components of tex2html_wrap_inline2889 and tex2html_wrap_inline2969. This produces the cepstral analysis, shown diagrammatically in figure 28.

  figure563
Figure 28: Cepstral analysis

For the example speech of figures 21,22,23 the resulting (real) cepstral analysis is shown in figure 29.

  figure574
Figure 29: The full real cepstrum. Calculated using Matlab: ifft(log(abs(fft(hamming(512) .* sig))))

It can be seen that most of the detail occurs near the origin and in peaks higher up the cepstrum. Thus the lower numbered coefficients provide the envelope information. The remainder of the detail is mostly contained in the peaks which are separated by the pitch period (in this case about 70 sample) and provide the fine detail pitch information.

An enlargement of the few samples is shown in figure 30

  figure582
Figure 30: The first 20 cepstral coefficients



Speech Vision Robotics group/Tony Robinson