Weighted Finite State Transducers in Statistical Machine Translation

Weighted Finite State Transducers in Statistical Machine Translation” by W. Byrne, International Winter School in Language and Speech Technologies (WSLST 2012), Tarragona, Spain. Jan. 2012. Six lecture short course.


This short course will present some recent advances in statistical machine translation (SMT) using modelling approaches based on Weighted Finite State Transducers (WFSTs) and Finite State Automata (FSA). The course focus will be on decoding procedures for SMT, i.e. the generation of translations using stochastic translation grammars and language models. WFSTs can offer a very powerful modelling framework for language processing. For problems which can be formulated in terms of WFSTs or FSAs, there are general purpose algorithms which can be used to implement efficient and exact search and estimation procedures. This is true even for problems which are not inherently finite state, such as translation with some stochastic context free grammars. The course will begin with an introduction to WFSTs, pushdown automata, and semirings in the context of SMT. The use of WFST and FSA modelling approaches will be presented for: SMT decoding with phrase-based models; SMT decoding with stochastic synchronous context free grammars (e.g. Hiero); SMT parameter optimisation (MERT); the use of large language models and 'fast' grammars in translation; translation lattice generation; and rescoring procedures such as minimum Bayes risk decoding and system combination. Implementations using the OpenFst toolkit will also be described. The course material will be suitable for researchers already familiar with SMT and who wish to learn about alternative methods in decoder design. Enough background will be given so that researchers new to machine translation or unfamiliar with applications of WFSTs in natural language processing will also find the material appropriate.

BibTeX entry:

   author = {W. Byrne},
   title = {Weighted Finite State Transducers in Statistical Machine
   publisher = {International Winter School in Language and Speech
	Technologies (WSLST 2012), Tarragona, Spain},
   month = jan,
   year = {2012},
   note = {Six lecture short course.},
   url = {http://grammars.grlmc.com/wslst2012/courseDescription.php#Byrne}

Back to Bill Byrne publications.