The source filter model of speech production decomposes the speech
signal, , into an excitation, , and a linear filter, . In the frequency domain:

We wish to represent the envelope of the speech power
spectra and to represent the fine detail of the
excitation. For example, see figure 22 and
figure 23. With a suitable definition of the log of a
complex number () this may be achieved
with:

For most speech processing applications we require only the amplitude
spectra, hence the equation is written:

The slowly varying components of are
represented by the low frequencies and the fine detail by the high
frequencies. Hence another Fourier transform is the natural way to
separate the components of and . This
produces the cepstral analysis, shown diagrammatically in
figure 28.

For the example speech of figures 21,22,23 the resulting (real) cepstral analysis is shown in figure 29.

**Figure 29:** The full real cepstrum.
Calculated using Matlab: `ifft(log(abs(fft(hamming(512) .* sig))))`

It can be seen that most of the detail occurs near the origin and in peaks higher up the cepstrum. Thus the lower numbered coefficients provide the envelope information. The remainder of the detail is mostly contained in the peaks which are separated by the pitch period (in this case about 70 sample) and provide the fine detail pitch information.

An enlargement of the few samples is shown in figure 30

**Figure 30:** The first 20 cepstral coefficients

Speech Vision Robotics group/Tony Robinson