Multimodel Fusion H.Liao M.Phil CSTIT Thesis. July 2002. Computing applications are becoming increasingly complex and pervasive as exemplified by ubiquitous microprocessors in everyday appliances and resulting feature explosion. Recent technologies like speech or gesture recognition can make such systems more natural, easy to use, and robust. Work by [Oviatt et al. 2000] and other groups have shown this to be the case. A challenge has been to fuse these different input modalities effectively to complement their natural strengths and use redundancies to improve robustness. This project looks to examine methods of combining information from different input modalities in multimodal dialogue systems. A theoretically recogniser agnostic and domain independent multimodal framework is developed. The framework is implemented in this project with a commercial speech recogniser and a freeware gesture recogniser. Many improvements were made to the gesture recogniser mainly to handle spatial gestures. Grammar rules and a unification-style algorithm were applied to fuse the semantic frames from different modalities into candidate command frames for the application. By using a multimodal grammar and scoring, the best candidate frame is selected and sent to an application. To validate and evaluate the framework, a multimodal tourist guide for Bristol was built. As implemented, it has zoom and pan commands to navigate the map, and queries about distance and paths between sites and locations of sites. The evaluation task was comprised of four high level goals that users were expected to carry out with minimal guidance. Despite the sparse instruction, users were enthusiastic to multimodal interaction. The unification algorithm used in conjunction with N-Best lists was a viable fusion method. It improved performance over the estimated lower bound and 1-Best case and allowed the system to improve recognition results by using only valid combinations of inputs. These observations were made though with a set of three users; this small sample size limits the extent to which conclusions can be made from the evaluation.