|
|
|
Spoken dialogue systems have a wide range of application including call centre automation, control of devices in the home, interactive entertainment, and hands-free applications. Despite their increasing use, however, deployment costs remain high and operational systems continue to be fragile. A major contributor to both of these problems is that the core dialogue manager which interprets the spoken input, and plans the next response is a deterministic program, hand-crafted and manually tuned for each application. Experience applying statistical techniques in both speech recognition and synthesis has shown that learning from data and using optimal decision making can dramatically improve performance and lower costs.
A natural framework for statistical dialogue modelling is the Markov Decision Process (MDP), however, a major limitation of MDPs is that they require the state of the system to be known exactly, and therefore they do not address the essense of the dialogue management problem which is to handle the uncertainty caused by speech recognition and understanding errors. The aim of this project is to develop a framework for spoken dialogue systems which uses a more general statistical model called a Partially Observable Markov Decision Process (POMDP). The key assumption in the POMDP is that the state of the system (which includes the goal in the user's mind) can never be known with certainty. Hence, it maintains a probability distribution over all possible states and bases its decisions on this distribution. In effect, the POMDP tracks every possible dialogue hypothesis at every turn, maintaining a probability for each. This provides it with a principled framework for handling ambiguity and uncertainty.
Although this formulation is extremely powerful, it is also computationally very complex since the POMDP state is a vector in a very high dimensional continuous space. This makes direct belief monitoring and policy optimisation essentially intractable and hence little progress has been made towards real applications. Recently, however, the proposer has demonstrated that practical POMDP-based systems are feasible by exploiting two key ideas. Firstly, the complexity of belief monitoring can be greatly reduced by partitioning the state space into equivalence classes. Secondly, in the context of spoken dialogues, it is possible to map dialogue hypotheses into a much-reduced summary space where effective policy optimisation is possible. These ideas have been built into a prototype system called the Hidden Information State (HIS) system and their feasibility has been demonstrated and evaluated in a Tourist Information domain.
Although it serves its purpose as a proof of concept, the HIS prototype was built using a simple 1-best recogniser interface, very simplistic probabilistic models, a hand-crafted user simulator and a rudimentary grid-based policy learning method. To fully realise the potential of POMDP-based systems, much more needs to be done and the programme of work set out in this proposal seeks to achieve this. The key areas that will be addressed are more efficient belief state partitioning and monitoring, accurate statistical user models trained on real data, integration of N-best recognition hypotheses, and improved summary state mapping and policy optimisation. The result will be a system which is trained automatically on data, which delivers high performance at low cost, which is significantly more robust to recognition errors, and which can learn and adapt on-line.
Based on these developments, a practical telephone and web-based SDS has been built for Tourist Information. This system was trialled in March 2009 and trialled again in November 2010. The results showed that the POMDP approach does provide robustness to noise, and the 2010 trial demonstrated the additional capability made possible by the project.
Demonstrations of the HIS system have been given on numerous occasions, such as the SLT 2008 conference, at the annual open days of the Cambridge Engineering Department for prospective students, at the ICT 2010 exhibition in Brussels, as well as for visiting delegations from various companies.
The methods and system developed in this project also underpinned recent keynote talks at the two major annual speech conferences: "Cognitive User Interfaces", ICASSP 2009 in Taiwan and "Still Talking to Machines", Interspeech 2010 in Japan.
To see the the HIS dialogue manager at work, watch this video of an interaction with a user. The graphical user interface shows the internal processing of the dialogue manager and in particular how the dialogue state hypotheses and their probabilities evolve during the dialogue.
M. Gašić and S. Young.
Effective Handling of Dialogue State in
the Hidden Information State POMDP-based Dialogue Manager.
In ACM Transactions on Speech and Language Processing, 2010. To
appear.
B. Thomson, F. Jurčíček, M. Gašić, S. Keizer,
F. Mairesse, K. Yu and S. Young.
Parameter learning
for POMDP spoken dialogue models.
In IEEE Workshop on Spoken Language Technology (SLT 2010), Berkeley,
CA, December 2010.
B. Thomson, K. Yu, S. Keizer, M. Gašić, F. Jurčíček,
F. Mairesse and S. Young.
Bayesian Update of
State for the Let's Go Spoken Dialogue Challenge.
In IEEE Workshop on Spoken Language Technology (SLT 2010), Berkeley,
CA, December 2010.
F. Jurčíček, B. Thomson, S. Keizer, F. Mairesse,
M. Gašić, K. Yu and S. Young.
Natural
Belief-Critic: a reinforcement algorithm for parameter estimation in
statistical spoken dialogue systems.
In Proceedings of Interspeech, Makuhari, Japan, September 2010.
F. Lefèvre, F. Mairesse, and S. Young.
Cross-Lingual Spoken Language Understanding from
Unaligned Data using Discriminative Classification Models and Machine
Translation.
In Proceedings of Interspeech, Makuhari, Japan, September 2010.
S. Young.
Still Talking to
Machines (Cognitively Speaking).
Keynote speech at Interspeech 2010, Makuhari, Japan, September 2010.
S. Keizer, M. Gašić, F. Jurčíček, F. Mairesse,
B. Thomson, K. Yu and S. Young.
Parameter estimation for agenda-based user simulation.
In Proceedings of SIGdial, Tokyo, Japan, September 2010.
M. Gašić, F. Jurčíček, S. Keizer, F. Mairesse,
B. Thomson, K. Yu and S. Young.
Gaussian Processes for Fast Policy Optimisation of POMDP-based Dialogue
Managers.
In Proceedings of SIGdial, Tokyo, Japan, September 2010.
M. Gašić and S. Young.
Effective
Handling of Dialogue State in HIS POMDP Dialogue Manager for Complex
Structure Domains.
Technical report, CUED/F-INFENG/TR.650, Cambridge University, 2010.
B. Thomson and S. Young.
Bayesian update of
dialogue state: A POMDP framework for spoken dialogue systems.
In Computer Speech and Language, 24(4): 562-588, October 2010.
S. Young.
Cognitive User
Interfaces.
In Signal Processing Magazine, 27(3): 128-140, 2010.
F. Mairesse, M. Gašić, F. Jurčíček, S. Keizer,
J. Prombonas, B. Thomson, K. Yu and S. Young.
Phrase-based
Statistical Language Generation using Graphical Models and Active
Learning.
In Proceedings of ACL, Uppsala, Sweden, July 2010.
S. Young, M. Gašić, S. Keizer, F. Mairesse, J. Schatzmann, B. Thomson and
K. Yu.
The Hidden Information State Model: a practical framework for
POMDP-based spoken dialogue management.
In Computer Speech and Language, 24(2):150-174, April 2010.
M. Gašić, F. Lefèvre, F. Jurčíček, S. Keizer,
F. Mairesse, B. Thomson, K. Yu and S. Young.
Back-off Action
Selection in Summary Space-Based POMDP dialogue systems.
In Proceedings of ASRU, Merano, Italy, December 2009.
F. Lefèvre, M. Gašić, F. Jurčíček, S. Keizer,
F. Mairesse, B. Thomson, K. Yu and S. Young.
k-Nearest Neighbor Monte-Carlo Control Algorithm for POMDP-Based Dialogue
Systems.
In Proceedings of SIGdial, London, UK, September 2009.
F. Mairesse, M. Gašić, F. Jurčíček, S. Keizer,
B. Thomson, K. Yu and S. Young.
Spoken Language
Understanding from Unaligned Data using Discriminative Classification
Models.
In IEEE Int. Conf. on Acoustics, Speech, and Signal Processing
(ICASSP), Taipei, Taiwan, April 2009.
S. Keizer, M. Gašić, F. Mairesse, B. Thomson, K. Yu, and S. Young.
Modelling user
behaviour in the HIS-POMDP dialogue manager.
In Proceedings of IEEE Workshop on Spoken Language Technology, Goa,
India, December 2008.
B. Thomson, K. Yu, M. Gašić, S. Keizer, F. Mairesse, J. Schatzmann
and S. Young.
Evaluating
semantic-level confidence scores with multiple hypotheses.
In Proceedings of Interspeech, Brisbane, Australia, September 2008.
B. Thomson, M. Gašić, S. Keizer, F. Mairesse, J. Schatzmann, K. Yu
and S. Young.
User study of the
Bayesian Update of Dialogue State approach to dialogue management.
In Proceedings of Interspeech, Brisbane, Australia, September 2008.
M. Gašić, S. Keizer, F. Mairesse, J. Schatzmann, B. Thomson,
K. Yu, and S. Young.
Training and Evaluation of the HIS POMDP Dialogue System in Noise.
In Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue,
Columbus, Ohio, June 2008.
B. Thomson, J. Schatzmann and S. Young (2008).
Bayesian Update of
Dialogue State for Robust Dialogue Systems.
In Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP),
Las Vegas, NV, March/April 2008.