Abstract for james_thesis

PhD Thesis, University of Cambridge


David A James

June 1995

The research presented in this thesis addresses the topic of ad hoc retrieval of information from collections of spoken items such as radio news bulletins.

Modern digital computers are becoming increasingly adept at processing non-textual data, such as speech. Consequently, new methods are required to allow users to pin-point specific items of interest in large data collections. Such a method might exploit the Hidden Markov Model (HMM), which has proved successful as the basis for many experimental speech recognition systems, and the well-understood techniques of document retrieval that have arisen from many years' research into textual information retrieval (IR).

However, so far there has been little exploration of the potential combination of these methods in order to index `spoken word' data. In the IR community, several papers have put forward an approach to the problem but this approach has not been properly tested. Work done in the speech recognition area has tended to concentrate on developing systems for topic classification. These systems are extensively pre-trained for the task of partitioning a set of spoken messages into a set of disjoint and exhaustive classes, each one representing some topic. Their utility is, in practice, limited by the fixed class set and slow operation, and they do not represent an approach to the problem of retrieving items that correspond to arbitrary topics.

This thesis describes experiments combining the techniques of classical information retrieval with HMM-based speech recognition methods in order to retrieve items from a collection of spoken messages corresponding to items of radio news. In a baseline system, a new technique for wordspotting allows items matching an arbitrary expression of the information requirement to be retrieved quickly and reasonably accurately. The system is subsequently improved through the addition of appropriate language models and the use of state-of-the-art acoustic modelling. Finally, performance is compared with that obtained by two alternative approaches, including one recently proposed in the IR literature, and found to be considerably superior.

Key Words: speech recognition, information retrieval, topic classification, keyword spotting, wordspotting.

(ftp:) james_thesis.ps.Z (http:) james_thesis.ps.Z
PDF (automatically generated from original PostScript document - may be badly aliased on screen):
  (ftp:) james_thesis.pdf | (http:) james_thesis.pdf

If you have difficulty viewing files that end '.gz', which are gzip compressed, then you may be able to find tools to uncompress them at the gzip web site.

If you have difficulty viewing files that are in PostScript, (ending '.ps' or '.ps.gz'), then you may be able to find tools to view them at the gsview web site.

We have attempted to provide automatically generated PDF copies of documents for which only PostScript versions have previously been available. These are clearly marked in the database - due to the nature of the automatic conversion process, they are likely to be badly aliased when viewed at default resolution on screen by acroread.