Task Dependent Loss Functions in Speech Recognition: A-star search over recognition lattices

Download: PDF.

“Task Dependent Loss Functions in Speech Recognition: A-star search over recognition lattices” by V. Goel and W. Byrne. In Proc. of the European Conference on Speech Communication and Technology (EUROSPEECH), 1999.

Abstract

A recognition strategy that can be matched to specific system performance criteria has recently been found to yield improvem ents over the usual maximum a posteriori probability strategy. Some examples of different system performance criteria are word error rate (WER), F-measure for Named Entity extraction tasks, and word-specific errors for keyword spotting tasks. In the match ed-to-the-task strategy the hypothesis is chosen to minimize the expected loss or the Bayes Risk under a loss function defined by th e performance measure of interest. Due to the prohibitively expensive implementation of this strategy, only an approximate implemen tation as an N-best list rescoring scheme has been used so far. Our goal is to improve the performance of such risk-based dec oders by developing search strategies that can incorporate more acoustic evidence. In this paper we present search algorithms to implement the risk-based recognition strategy over word lattices that contain acoustic and language model scores. These algorithms are extensions of the N-best list rescoring approximation and are formulated as A-star algorithms. We first present a single stack A-star search and show how to obtain an under-estimate and an over-estimate of the cost needed for the search. For loss functions that do not depend on time segmentation of hypotheses, a prefix-tree based simpl ification of the single stack algorithm is then derived. For yet a further subset of loss functions, including the usual Levenshtei n distance based loss for WER reduction tasks, we describe a search organization that facilitates further efficiencies in computatio n and storage. Finally we present a path equivalence criterion for merging of prefix tree nodes during search to allow for a larger search space. We find that restricted loss functions yield the most efficient search procedures. However the general single stack search can be applied quite broadly even in principle to loss functions that measure semantic agreement between sentences. Preliminary experiments were performed for WER reduction task on the Switchboard corpus, dev-test set of the 1997 JHU-LVCSR workshop. We obtain an error rate reduction of 0.8-0.9% absolute over a baseline of 38.5% WER. The search speed is comparable to the N-best list rescoring procedure which is much more restrictive in the amount of hypotheses considered for search and produces slightly inferior results (0.5-0.6% absolute improvement). At the conference we will present the framework of task dependent recognition strategy, its implementation as A-star search, and the speed and accuracy comparison of the search with N-best list rescoring procedure.

Download: PDF.

BibTeX entry:

@inproceedings{mbr_eurospeech99,
   author = {V. Goel and W. Byrne},
   title = {Task Dependent Loss Functions in Speech Recognition: {A}-star
	search over recognition lattices},
   booktitle = {Proc. of the European Conference on Speech Communication
	and Technology (EUROSPEECH)},
   pages = {(4 pages)},
   year = {1999}
}

Back to Bill Byrne publications.