Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stefan Ultes

Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management

Jul 05, 2017

Pei-Hao Su, Pawel Budzianowski, Stefan Ultes, Milica Gasic, Steve Young

Figure 1 for Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management

Figure 2 for Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management

Figure 3 for Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management

Figure 4 for Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management

Abstract:Deep reinforcement learning (RL) methods have significant potential for dialogue policy optimisation. However, they suffer from a poor performance in the early stages of learning. This is especially problematic for on-line learning with real users. Two approaches are introduced to tackle this problem. Firstly, to speed up the learning process, two sample-efficient neural networks algorithms: trust region actor-critic with experience replay (TRACER) and episodic natural actor-critic with experience replay (eNACER) are presented. For TRACER, the trust region helps to control the learning step size and avoid catastrophic model changes. For eNACER, the natural gradient identifies the steepest ascent direction in policy space to speed up the convergence. Both models employ off-policy learning with experience replay to improve sample-efficiency. Secondly, to mitigate the cold start issue, a corpus of demonstration data is utilised to pre-train the models prior to on-line reinforcement learning. Combining these two approaches, we demonstrate a practical approach to learn deep RL-based dialogue policies and demonstrate their effectiveness in a task-oriented information seeking domain.

* Accepted as a long paper in SigDial 2017

Via

Access Paper or Ask Questions

A Network-based End-to-End Trainable Task-oriented Dialogue System

Apr 24, 2017

Tsung-Hsien Wen, David Vandyke, Nikola Mrksic, Milica Gasic, Lina M. Rojas-Barahona, Pei-Hao Su, Stefan Ultes, Steve Young

Abstract:Teaching machines to accomplish tasks by conversing naturally with humans is challenging. Currently, developing task-oriented dialogue systems requires creating multiple components and typically this involves either a large amount of handcrafting, or acquiring costly labelled datasets to solve a statistical learning problem for each component. In this work we introduce a neural network-based text-in, text-out end-to-end trainable goal-oriented dialogue system along with a new way of collecting dialogue data based on a novel pipe-lined Wizard-of-Oz framework. This approach allows us to develop dialogue systems easily and without making too many assumptions about the task at hand. The results show that the model can converse with human subjects naturally whilst helping them to accomplish tasks in a restaurant search domain.

* published at EACL 2017

Via

Access Paper or Ask Questions

Exploiting Sentence and Context Representations in Deep Neural Models for Spoken Language Understanding

Oct 13, 2016

Lina M. Rojas Barahona, Milica Gasic, Nikola Mrkšić, Pei-Hao Su, Stefan Ultes, Tsung-Hsien Wen, Steve Young

Figure 1 for Exploiting Sentence and Context Representations in Deep Neural Models for Spoken Language Understanding

Figure 2 for Exploiting Sentence and Context Representations in Deep Neural Models for Spoken Language Understanding

Figure 3 for Exploiting Sentence and Context Representations in Deep Neural Models for Spoken Language Understanding

Figure 4 for Exploiting Sentence and Context Representations in Deep Neural Models for Spoken Language Understanding

Abstract:This paper presents a deep learning architecture for the semantic decoder component of a Statistical Spoken Dialogue System. In a slot-filling dialogue, the semantic decoder predicts the dialogue act and a set of slot-value pairs from a set of n-best hypotheses returned by the Automatic Speech Recognition. Most current models for spoken language understanding assume (i) word-aligned semantic annotations as in sequence taggers and (ii) delexicalisation, or a mapping of input words to domain-specific concepts using heuristics that try to capture morphological variation but that do not scale to other domains nor to language variation (e.g., morphology, synonyms, paraphrasing ). In this work the semantic decoder is trained using unaligned semantic annotations and it uses distributed semantic representation learning to overcome the limitations of explicit delexicalisation. The proposed architecture uses a convolutional neural network for the sentence representation and a long-short term memory network for the context representation. Results are presented for the publicly available DSTC2 corpus and an In-car corpus which is similar to DSTC2 but has a significantly higher word error rate (WER).

Via

Access Paper or Ask Questions

Dialogue manager domain adaptation using Gaussian process reinforcement learning

Sep 09, 2016

Milica Gasic, Nikola Mrksic, Lina M. Rojas-Barahona, Pei-Hao Su, Stefan Ultes, David Vandyke, Tsung-Hsien Wen, Steve Young

Figure 1 for Dialogue manager domain adaptation using Gaussian process reinforcement learning

Figure 2 for Dialogue manager domain adaptation using Gaussian process reinforcement learning

Figure 3 for Dialogue manager domain adaptation using Gaussian process reinforcement learning

Figure 4 for Dialogue manager domain adaptation using Gaussian process reinforcement learning

Abstract:Spoken dialogue systems allow humans to interact with machines using natural speech. As such, they have many benefits. By using speech as the primary communication medium, a computer interface can facilitate swift, human-like acquisition of information. In recent years, speech interfaces have become ever more popular, as is evident from the rise of personal assistants such as Siri, Google Now, Cortana and Amazon Alexa. Recently, data-driven machine learning methods have been applied to dialogue modelling and the results achieved for limited-domain applications are comparable to or outperform traditional approaches. Methods based on Gaussian processes are particularly effective as they enable good models to be estimated from limited training data. Furthermore, they provide an explicit estimate of the uncertainty which is particularly useful for reinforcement learning. This article explores the additional steps that are necessary to extend these methods to model multiple dialogue domains. We show that Gaussian process reinforcement learning is an elegant framework that naturally supports a range of methods, including prior knowledge, Bayesian committee machines and multi-agent learning, for facilitating extensible and adaptable dialogue systems.

* accepted for publication in Computer Speech and Language

Via

Access Paper or Ask Questions

Conditional Generation and Snapshot Learning in Neural Dialogue Systems

Jun 10, 2016

Tsung-Hsien Wen, Milica Gasic, Nikola Mrksic, Lina M. Rojas-Barahona, Pei-Hao Su, Stefan Ultes, David Vandyke, Steve Young

Figure 1 for Conditional Generation and Snapshot Learning in Neural Dialogue Systems

Figure 2 for Conditional Generation and Snapshot Learning in Neural Dialogue Systems

Figure 3 for Conditional Generation and Snapshot Learning in Neural Dialogue Systems

Figure 4 for Conditional Generation and Snapshot Learning in Neural Dialogue Systems

Abstract:Recently a variety of LSTM-based conditional language models (LM) have been applied across a range of language generation tasks. In this work we study various model architectures and different ways to represent and aggregate the source information in an end-to-end neural dialogue system framework. A method called snapshot learning is also proposed to facilitate learning from supervised sequential signals by applying a companion cross-entropy objective function to the conditioning vector. The experimental and analytical results demonstrate firstly that competition occurs between the conditioning vector and the LM, and the differing architectures provide different trade-offs between the two. Secondly, the discriminative power and transparency of the conditioning vector is key to providing both model interpretability and better performance. Thirdly, snapshot learning leads to consistent performance improvements independent of which architecture is used.

Via

Access Paper or Ask Questions

Continuously Learning Neural Dialogue Management

Jun 08, 2016

Pei-Hao Su, Milica Gasic, Nikola Mrksic, Lina Rojas-Barahona, Stefan Ultes, David Vandyke, Tsung-Hsien Wen, Steve Young

Figure 1 for Continuously Learning Neural Dialogue Management

Figure 2 for Continuously Learning Neural Dialogue Management

Figure 3 for Continuously Learning Neural Dialogue Management

Figure 4 for Continuously Learning Neural Dialogue Management

Abstract:We describe a two-step approach for dialogue management in task-oriented spoken dialogue systems. A unified neural network framework is proposed to enable the system to first learn by supervision from a set of dialogue data and then continuously improve its behaviour via reinforcement learning, all using gradient-based algorithms on one single model. The experiments demonstrate the supervised model's effectiveness in the corpus-based evaluation, with user simulation, and with paid human subjects. The use of reinforcement learning further improves the model's performance in both interactive settings, especially under higher-noise conditions.

Via

Access Paper or Ask Questions

On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems

Jun 02, 2016

Pei-Hao Su, Milica Gasic, Nikola Mrksic, Lina Rojas-Barahona, Stefan Ultes, David Vandyke, Tsung-Hsien Wen, Steve Young

Figure 1 for On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems

Figure 2 for On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems

Figure 3 for On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems

Figure 4 for On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems

Abstract:The ability to compute an accurate reward function is essential for optimising a dialogue policy via reinforcement learning. In real-world applications, using explicit user feedback as the reward signal is often unreliable and costly to collect. This problem can be mitigated if the user's intent is known in advance or data is available to pre-train a task success predictor off-line. In practice neither of these apply for most real world applications. Here we propose an on-line learning framework whereby the dialogue policy is jointly trained alongside the reward model via active learning with a Gaussian process model. This Gaussian process operates on a continuous space dialogue representation generated in an unsupervised fashion using a recurrent neural network encoder-decoder. The experimental results demonstrate that the proposed framework is able to significantly reduce data annotation costs and mitigate noisy user feedback in dialogue policy learning.

* Accepted as a long paper in ACL 2016

Via

Access Paper or Ask Questions