The source filter model of speech production decomposes the speech
signal, , into an excitation, , and a linear filter, . In the frequency domain:
We wish to represent the envelope of the speech power
spectra and to represent the fine detail of the
excitation. For example, see figure 22 and
figure 23. With a suitable definition of the log of a
complex number () this may be achieved
For most speech processing applications we require only the amplitude
spectra, hence the equation is written:
The slowly varying components of are represented by the low frequencies and the fine detail by the high frequencies. Hence another Fourier transform is the natural way to separate the components of and . This produces the cepstral analysis, shown diagrammatically in figure 28.
Figure 28: Cepstral analysis
For the example speech of figures 21,22,23 the resulting (real) cepstral analysis is shown in figure 29.
Figure 29: The full real cepstrum. Calculated using Matlab: ifft(log(abs(fft(hamming(512) .* sig))))
It can be seen that most of the detail occurs near the origin and in peaks higher up the cepstrum. Thus the lower numbered coefficients provide the envelope information. The remainder of the detail is mostly contained in the peaks which are separated by the pitch period (in this case about 70 sample) and provide the fine detail pitch information.
An enlargement of the few samples is shown in figure 30
Figure 30: The first 20 cepstral coefficients