[Univ of Cambridge] [Dept of Engineering]

Spoken Dialogue Management using Partially Observable Markov Decision Processes

EPSRC research grant

Machine Intelligence Laboratory

Dialogue Systems Group


people | project description | highlights | video | publications


People

Project Description

Spoken dialogue systems have a wide range of application including call centre automation, control of devices in the home, interactive entertainment, and hands-free applications. Despite their increasing use, however, deployment costs remain high and operational systems continue to be fragile. A major contributor to both of these problems is that the core dialogue manager which interprets the spoken input, and plans the next response is a deterministic program, hand-crafted and manually tuned for each application. Experience applying statistical techniques in both speech recognition and synthesis has shown that learning from data and using optimal decision making can dramatically improve performance and lower costs.

architecture

A natural framework for statistical dialogue modelling is the Markov Decision Process (MDP), however, a major limitation of MDPs is that they require the state of the system to be known exactly, and therefore they do not address the essense of the dialogue management problem which is to handle the uncertainty caused by speech recognition and understanding errors. The aim of this project is to develop a framework for spoken dialogue systems which uses a more general statistical model called a Partially Observable Markov Decision Process (POMDP). The key assumption in the POMDP is that the state of the system (which includes the goal in the user's mind) can never be known with certainty. Hence, it maintains a probability distribution over all possible states and bases its decisions on this distribution. In effect, the POMDP tracks every possible dialogue hypothesis at every turn, maintaining a probability for each. This provides it with a principled framework for handling ambiguity and uncertainty.

Although this formulation is extremely powerful, it is also computationally very complex since the POMDP state is a vector in a very high dimensional continuous space. This makes direct belief monitoring and policy optimisation essentially intractable and hence little progress has been made towards real applications. Recently, however, the proposer has demonstrated that practical POMDP-based systems are feasible by exploiting two key ideas. Firstly, the complexity of belief monitoring can be greatly reduced by partitioning the state space into equivalence classes. Secondly, in the context of spoken dialogues, it is possible to map dialogue hypotheses into a much-reduced summary space where effective policy optimisation is possible. These ideas have been built into a prototype system called the Hidden Information State (HIS) system and their feasibility has been demonstrated and evaluated in a Tourist Information domain.

Although it serves its purpose as a proof of concept, the HIS prototype was built using a simple 1-best recogniser interface, very simplistic probabilistic models, a hand-crafted user simulator and a rudimentary grid-based policy learning method. To fully realise the potential of POMDP-based systems, much more needs to be done and the programme of work set out in this proposal seeks to achieve this. The key areas that will be addressed are more efficient belief state partitioning and monitoring, accurate statistical user models trained on real data, integration of N-best recognition hypotheses, and improved summary state mapping and policy optimisation. The result will be a system which is trained automatically on data, which delivers high performance at low cost, which is significantly more robust to recognition errors, and which can learn and adapt on-line.

Highlights

Starting from the basic HIS framework, this project has:

Based on these developments, a practical telephone and web-based SDS has been built for Tourist Information. This system was trialled in March 2009 and trialled again in November 2010. The results showed that the POMDP approach does provide robustness to noise, and the 2010 trial demonstrated the additional capability made possible by the project.

Demonstrations of the HIS system have been given on numerous occasions, such as the SLT 2008 conference, at the annual open days of the Cambridge Engineering Department for prospective students, at the ICT 2010 exhibition in Brussels, as well as for visiting delegations from various companies.

The methods and system developed in this project also underpinned recent keynote talks at the two major annual speech conferences: "Cognitive User Interfaces", ICASSP 2009 in Taiwan and "Still Talking to Machines", Interspeech 2010 in Japan.

Video demonstration of HIS dialogue system

To see the the HIS dialogue manager at work, watch this video of an interaction with a user. The graphical user interface shows the internal processing of the dialogue manager and in particular how the dialogue state hypotheses and their probabilities evolve during the dialogue.

Publications

M. Gašić and S. Young.
Effective Handling of Dialogue State in the Hidden Information State POMDP-based Dialogue Manager.
In ACM Transactions on Speech and Language Processing, 2010. To appear.

B. Thomson, F. Jurčíček, M. Gašić, S. Keizer, F. Mairesse, K. Yu and S. Young.
Parameter learning for POMDP spoken dialogue models.
In IEEE Workshop on Spoken Language Technology (SLT 2010), Berkeley, CA, December 2010.

B. Thomson, K. Yu, S. Keizer, M. Gašić, F. Jurčíček, F. Mairesse and S. Young.
Bayesian Update of State for the Let's Go Spoken Dialogue Challenge.
In IEEE Workshop on Spoken Language Technology (SLT 2010), Berkeley, CA, December 2010.

F. Jurčíček, B. Thomson, S. Keizer, F. Mairesse, M. Gašić, K. Yu and S. Young.
Natural Belief-Critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems.
In Proceedings of Interspeech, Makuhari, Japan, September 2010.

F. Lefèvre, F. Mairesse, and S. Young.
Cross-Lingual Spoken Language Understanding from Unaligned Data using Discriminative Classification Models and Machine Translation.
In Proceedings of Interspeech, Makuhari, Japan, September 2010.

S. Young.
Still Talking to Machines (Cognitively Speaking).
Keynote speech at Interspeech 2010, Makuhari, Japan, September 2010.

S. Keizer, M. Gašić, F. Jurčíček, F. Mairesse, B. Thomson, K. Yu and S. Young.
Parameter estimation for agenda-based user simulation.
In Proceedings of SIGdial, Tokyo, Japan, September 2010.

M. Gašić, F. Jurčíček, S. Keizer, F. Mairesse, B. Thomson, K. Yu and S. Young.
Gaussian Processes for Fast Policy Optimisation of POMDP-based Dialogue Managers.
In Proceedings of SIGdial, Tokyo, Japan, September 2010.

M. Gašić and S. Young.
Effective Handling of Dialogue State in HIS POMDP Dialogue Manager for Complex Structure Domains.
Technical report, CUED/F-INFENG/TR.650, Cambridge University, 2010.

B. Thomson and S. Young.
Bayesian update of dialogue state: A POMDP framework for spoken dialogue systems.
In Computer Speech and Language, 24(4): 562-588, October 2010.

S. Young.
Cognitive User Interfaces.
In Signal Processing Magazine, 27(3): 128-140, 2010.

F. Mairesse, M. Gašić, F. Jurčíček, S. Keizer, J. Prombonas, B. Thomson, K. Yu and S. Young.
Phrase-based Statistical Language Generation using Graphical Models and Active Learning.
In Proceedings of ACL, Uppsala, Sweden, July 2010.

S. Young, M. Gašić, S. Keizer, F. Mairesse, J. Schatzmann, B. Thomson and K. Yu.
The Hidden Information State Model: a practical framework for POMDP-based spoken dialogue management.
In Computer Speech and Language, 24(2):150-174, April 2010.

M. Gašić, F. Lefèvre, F. Jurčíček, S. Keizer, F. Mairesse, B. Thomson, K. Yu and S. Young.
Back-off Action Selection in Summary Space-Based POMDP dialogue systems.
In Proceedings of ASRU, Merano, Italy, December 2009.

F. Lefèvre, M. Gašić, F. Jurčíček, S. Keizer, F. Mairesse, B. Thomson, K. Yu and S. Young.
k-Nearest Neighbor Monte-Carlo Control Algorithm for POMDP-Based Dialogue Systems.
In Proceedings of SIGdial, London, UK, September 2009.

F. Mairesse, M. Gašić, F. Jurčíček, S. Keizer, B. Thomson, K. Yu and S. Young.
Spoken Language Understanding from Unaligned Data using Discriminative Classification Models.
In IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan, April 2009.

S. Keizer, M. Gašić, F. Mairesse, B. Thomson, K. Yu, and S. Young.
Modelling user behaviour in the HIS-POMDP dialogue manager.
In Proceedings of IEEE Workshop on Spoken Language Technology, Goa, India, December 2008.

B. Thomson, K. Yu, M. Gašić, S. Keizer, F. Mairesse, J. Schatzmann and S. Young.
Evaluating semantic-level confidence scores with multiple hypotheses.
In Proceedings of Interspeech, Brisbane, Australia, September 2008.

B. Thomson, M. Gašić, S. Keizer, F. Mairesse, J. Schatzmann, K. Yu and S. Young.
User study of the Bayesian Update of Dialogue State approach to dialogue management.
In Proceedings of Interspeech, Brisbane, Australia, September 2008.

M. Gašić, S. Keizer, F. Mairesse, J. Schatzmann, B. Thomson, K. Yu, and S. Young.
Training and Evaluation of the HIS POMDP Dialogue System in Noise.
In Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue, Columbus, Ohio, June 2008.

B. Thomson, J. Schatzmann and S. Young (2008).
Bayesian Update of Dialogue State for Robust Dialogue Systems.
In Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, NV, March/April 2008.


Simon Keizer
January, 2011