The previous sections have demonstrated that low order linear prediction followed by Huffman coding to the Laplace distribution results in an efficient lossless waveform coder. Table 2 compares this technique to the popular general purpose compression utilities that are available. The table shows that the speech specific compression utility can achieve considerably better compression than more general tools. The compression and decompression speeds are the factors faster than real time when executed on a standard SparcStation I, except the result for the g722 ADPCM compression which was implemented on a SGI Indigo R4400 workstation using the supplied aifccompress/aifcdecompress utilities. The SGI timings were scaled by a factor of 3.9 which was determined by the relative execution times of shorten decompression on the two platforms.
Table 2: Compression rates and speeds
To investigate the effects of lossy coding on speech recognition performance the test portion of the TIMIT database was coded at four bits per sample and the resulting speech was recognised with a state of the art phone recognition system. Both shorten and the g722 ADPCM standard gave negligible additional errors (about 70 more errors over the baseline of 15934 errors), but it was necessary to apply a factor of four scaling to the waveform for use with the g722 ADPCM algorithm. g722 ADPCM without scaling and the telephony quality g721 ADPCM algorithm (designed for 8kHz sampling and operated at 16kHz) both produced significantly more errors (approximately 500 in 15934 errors). Coding this database at four bit per sample results in approximately another factor of two compression over lossless coding.
Decompression and playback of 16 bit, 44.1 kHz stereo audio takes approximately 45% of the available processing power of a 486DX2/66 based machine and 25% of a 60 MHz Pentium. Disk access accounted for 20% of the time on the slower machine. Performing compression to three bits per sample gives another factor of three compression, reducing the disk access time proportionally and providing 20% faster execution with no perceptual degradation (to the authors ears). Thus real time decompression of high quality audio is possible for a wide range of personal computers.