Fast Viterbi word-spotting

Fast implementations of Viterbi-based word-spotting

In any document retrieval system the retrieval time given a response is an important issue. Typically, in fixed keyword systems the keyword locations are determined in advance and stored in a table for fast look-up when a search request is received. In open vocabulary systems, the keyword locations have to be determined at the time of the search request. Fast word-spotting techniques are therefore essential.

The following is the abstract of a paper presented at ICASSP'96 in Atlanta. Patent applied for.

Fast implementations of Viterbi-based word-spotting

K.M.Knill and S.J.Young

This paper explores methods of increasing the speed of a Viterbi-based word-spotting system for audio document retrieval. Fast processing is essential since the user expects to receive the results of a keyword search many times faster than the actual length of the speech. A number of computational short-cuts to the standard Viterbi word-spotter are presented. These are based on exploiting the background Viterbi phone recognition path that is computed to provide a normalisation base. An initial approximation using the phone transition boundaries reduces the retrieval time by a factor of 5, while achieving a slight improvement in word-spotting performance. To further reduce retrieval time, pattern matching, feature selection, and Gaussian selection techniques are applied to this approximate pass to give a total x50 increase in speed with little loss in performance. In addition, a low memory requirement means that these approaches can be implemented on any platform, including hand-held devices.

Full paper. Please note, one of the Gaussian Selection equations quoted in this paper was wrong. See my page on Gaussian Selection for details and correct equation. The results all hold.

Back to Audio Document Processing main page

Please send bug reports/comments/suggestions to Kate Knill (kmk@eng.cam.ac.uk)