Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michel Galley

An Adversarially-Learned Turing Test for Dialog Generation Models

Apr 16, 2021

Xiang Gao, Yizhe Zhang, Michel Galley, Bill Dolan

Figure 1 for An Adversarially-Learned Turing Test for Dialog Generation Models

Figure 2 for An Adversarially-Learned Turing Test for Dialog Generation Models

Figure 3 for An Adversarially-Learned Turing Test for Dialog Generation Models

Figure 4 for An Adversarially-Learned Turing Test for Dialog Generation Models

Abstract:The design of better automated dialogue evaluation metrics offers the potential of accelerate evaluation research on conversational AI. However, existing trainable dialogue evaluation models are generally restricted to classifiers trained in a purely supervised manner, which suffer a significant risk from adversarial attacking (e.g., a nonsensical response that enjoys a high classification score). To alleviate this risk, we propose an adversarial training approach to learn a robust model, ATT (Adversarial Turing Test), that discriminates machine-generated responses from human-written replies. In contrast to previous perturbation-based methods, our discriminator is trained by iteratively generating unrestricted and diverse adversarial examples using reinforcement learning. The key benefit of this unrestricted adversarial training approach is allowing the discriminator to improve robustness in an iterative attack-defense game. Our discriminator shows high accuracy on strong attackers including DialoGPT and GPT-3.

* 7 pages, 2 figures

Via

Access Paper or Ask Questions

Ask what's missing and what's useful: Improving Clarification Question Generation using Global Knowledge

Apr 14, 2021

Bodhisattwa Prasad Majumder, Sudha Rao, Michel Galley, Julian McAuley

Figure 1 for Ask what's missing and what's useful: Improving Clarification Question Generation using Global Knowledge

Figure 2 for Ask what's missing and what's useful: Improving Clarification Question Generation using Global Knowledge

Figure 3 for Ask what's missing and what's useful: Improving Clarification Question Generation using Global Knowledge

Figure 4 for Ask what's missing and what's useful: Improving Clarification Question Generation using Global Knowledge

Abstract:The ability to generate clarification questions i.e., questions that identify useful missing information in a given context, is important in reducing ambiguity. Humans use previous experience with similar contexts to form a global view and compare it to the given context to ascertain what is missing and what is useful in the context. Inspired by this, we propose a model for clarification question generation where we first identify what is missing by taking a difference between the global and the local view and then train a model to identify what is useful and generate a question about it. Our model outperforms several baselines as judged by both automatic metrics and humans.

* Accepted in NAACL 2021, Code is available at https://github.com/microsoft/clarification-qgen-globalinfo

Via

Access Paper or Ask Questions

Data Augmentation for Abstractive Query-Focused Multi-Document Summarization

Mar 02, 2021

Ramakanth Pasunuru, Asli Celikyilmaz, Michel Galley, Chenyan Xiong, Yizhe Zhang, Mohit Bansal, Jianfeng Gao

Figure 1 for Data Augmentation for Abstractive Query-Focused Multi-Document Summarization

Figure 2 for Data Augmentation for Abstractive Query-Focused Multi-Document Summarization

Figure 3 for Data Augmentation for Abstractive Query-Focused Multi-Document Summarization

Figure 4 for Data Augmentation for Abstractive Query-Focused Multi-Document Summarization

Abstract:The progress in Query-focused Multi-Document Summarization (QMDS) has been limited by the lack of sufficient largescale high-quality training datasets. We present two QMDS training datasets, which we construct using two data augmentation methods: (1) transferring the commonly used single-document CNN/Daily Mail summarization dataset to create the QMDSCNN dataset, and (2) mining search-query logs to create the QMDSIR dataset. These two datasets have complementary properties, i.e., QMDSCNN has real summaries but queries are simulated, while QMDSIR has real queries but simulated summaries. To cover both these real summary and query aspects, we build abstractive end-to-end neural network models on the combined datasets that yield new state-of-the-art transfer results on DUC datasets. We also introduce new hierarchical encoders that enable a more efficient encoding of the query together with multiple documents. Empirical results demonstrate that our data augmentation and encoding methods outperform baseline models on automatic metrics, as well as on human evaluations along multiple attributes.

* AAAI 2021 (13 pages)

Via

Access Paper or Ask Questions

Text Editing by Command

Oct 24, 2020

Felix Faltings, Michel Galley, Gerold Hintz, Chris Brockett, Chris Quirk, Jianfeng Gao, Bill Dolan

Abstract:A prevailing paradigm in neural text generation is one-shot generation, where text is produced in a single step. The one-shot setting is inadequate, however, when the constraints the user wishes to impose on the generated text are dynamic, especially when authoring longer documents. We address this limitation with an interactive text generation setting in which the user interacts with the system by issuing commands to edit existing text. To this end, we propose a novel text editing task, and introduce WikiDocEdits, a dataset of single-sentence edits crawled from Wikipedia. We show that our Interactive Editor, a transformer-based model trained on this dataset, outperforms baselines and obtains positive results in both automatic and human evaluations. We present empirical and qualitative analyses of this model's performance.

Via

Access Paper or Ask Questions

Dialogue Response Ranking Training with Large-Scale Human Feedback Data

Sep 15, 2020

Xiang Gao, Yizhe Zhang, Michel Galley, Chris Brockett, Bill Dolan

Figure 1 for Dialogue Response Ranking Training with Large-Scale Human Feedback Data

Figure 2 for Dialogue Response Ranking Training with Large-Scale Human Feedback Data

Figure 3 for Dialogue Response Ranking Training with Large-Scale Human Feedback Data

Figure 4 for Dialogue Response Ranking Training with Large-Scale Human Feedback Data

Abstract:Existing open-domain dialog models are generally trained to minimize the perplexity of target human responses. However, some human replies are more engaging than others, spawning more followup interactions. Current conversational models are increasingly capable of producing turns that are context-relevant, but in order to produce compelling agents, these models need to be able to predict and optimize for turns that are genuinely engaging. We leverage social media feedback data (number of replies and upvotes) to build a large-scale training dataset for feedback prediction. To alleviate possible distortion between the feedback and engagingness, we convert the ranking problem to a comparison of response pairs which involve few confounding factors. We trained DialogRPT, a set of GPT-2 based models on 133M pairs of human feedback data and the resulting ranker outperformed several baselines. Particularly, our ranker outperforms the conventional dialog perplexity baseline with a large margin on predicting Reddit feedback. We finally combine the feedback prediction models and a human-like scoring model to rank the machine-generated dialog responses. Crowd-sourced human evaluation shows that our ranking method correlates better with real human preferences than baseline models.

* Accepted to appear at EMNLP 2020

Via

Access Paper or Ask Questions

MixingBoard: a Knowledgeable Stylized Integrated Text Generation Platform

May 17, 2020

Xiang Gao, Michel Galley, Bill Dolan

Figure 1 for MixingBoard: a Knowledgeable Stylized Integrated Text Generation Platform

Figure 2 for MixingBoard: a Knowledgeable Stylized Integrated Text Generation Platform

Figure 3 for MixingBoard: a Knowledgeable Stylized Integrated Text Generation Platform

Figure 4 for MixingBoard: a Knowledgeable Stylized Integrated Text Generation Platform

Abstract:We present MixingBoard, a platform for quickly building demos with a focus on knowledge grounded stylized text generation. We unify existing text generation algorithms in a shared codebase and further adapt earlier algorithms for constrained generation. To borrow advantages from different models, we implement strategies for cross-model integration, from the token probability level to the latent space level. An interface to external knowledge is provided via a module that retrieves on-the-fly relevant knowledge from passages on the web or any document collection. A user interface for local development, remote webpage access, and a RESTful API are provided to make it simple for users to build their own demos.

* accepted at ACL 2020

Via

Access Paper or Ask Questions

A Controllable Model of Grounded Response Generation

May 01, 2020

Zeqiu Wu, Michel Galley, Chris Brockett, Yizhe Zhang, Xiang Gao, Chris Quirk, Rik Koncel-Kedziorski, Jianfeng Gao, Hannaneh Hajishirzi, Mari Ostendorf(+1 more)

Figure 1 for A Controllable Model of Grounded Response Generation

Figure 2 for A Controllable Model of Grounded Response Generation

Figure 3 for A Controllable Model of Grounded Response Generation

Figure 4 for A Controllable Model of Grounded Response Generation

Abstract:Current end-to-end neural conversation models inherently lack the flexibility to impose semantic control in the response generation process. This control is essential to ensure that users' semantic intents are satisfied and to impose a degree of specificity on generated outputs. Attempts to boost informativeness alone come at the expense of factual accuracy, as attested by GPT-2's propensity to "hallucinate" facts. While this may be mitigated by access to background knowledge, there is scant guarantee of relevance and informativeness in generated responses. We propose a framework that we call controllable grounded response generation (CGRG), in which lexical control phrases are either provided by an user or automatically extracted by a content planner from dialogue context and grounding knowledge. Quantitative and qualitative results show that, using this framework, a GPT-2 based model trained on a conversation-like Reddit dataset outperforms strong generation baselines.

Via

Access Paper or Ask Questions

The Eighth Dialog System Technology Challenge

Nov 14, 2019

Seokhwan Kim, Michel Galley, Chulaka Gunasekara, Sungjin Lee, Adam Atkinson, Baolin Peng, Hannes Schulz, Jianfeng Gao, Jinchao Li, Mahmoud Adada(+11 more)

Figure 1 for The Eighth Dialog System Technology Challenge

Figure 2 for The Eighth Dialog System Technology Challenge

Figure 3 for The Eighth Dialog System Technology Challenge

Figure 4 for The Eighth Dialog System Technology Challenge

Abstract:This paper introduces the Eighth Dialog System Technology Challenge. In line with recent challenges, the eighth edition focuses on applying end-to-end dialog technologies in a pragmatic way for multi-domain task-completion, noetic response selection, audio visual scene-aware dialog, and schema-guided dialog state tracking tasks. This paper describes the task definition, provided datasets, and evaluation set-up for each track. We also summarize the results of the submitted systems to highlight the overall trends of the state-of-the-art technologies for the tasks.

* Submitted to NeurIPS 2019 3rd Conversational AI Workshop

Via

Access Paper or Ask Questions

DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

Nov 01, 2019

Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan

Figure 1 for DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

Figure 2 for DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

Figure 3 for DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

Figure 4 for DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

Abstract:We present a large, tunable neural conversational response generation model, DialoGPT (dialogue generative pre-trained transformer). Trained on 147M conversation-like exchanges extracted from Reddit comment chains over a period spanning from 2005 through 2017, DialoGPT extends the Hugging Face PyTorch transformer to attain a performance close to human both in terms of automatic and human evaluation in single-turn dialogue settings. We show that conversational systems that leverage DialoGPT generate more relevant, contentful and context-consistent responses than strong baseline systems. The pre-trained model and training pipeline are publicly released to facilitate research into neural response generation and the development of more intelligent open-domain dialogue systems.

Via

Access Paper or Ask Questions

Structuring Latent Spaces for Stylized Response Generation

Sep 03, 2019

Xiang Gao, Yizhe Zhang, Sungjin Lee, Michel Galley, Chris Brockett, Jianfeng Gao, Bill Dolan

Figure 1 for Structuring Latent Spaces for Stylized Response Generation

Figure 2 for Structuring Latent Spaces for Stylized Response Generation

Figure 3 for Structuring Latent Spaces for Stylized Response Generation

Figure 4 for Structuring Latent Spaces for Stylized Response Generation

Abstract:Generating responses in a targeted style is a useful yet challenging task, especially in the absence of parallel data. With limited data, existing methods tend to generate responses that are either less stylized or less context-relevant. We propose StyleFusion, which bridges conversation modeling and non-parallel style transfer by sharing a structured latent space. This structure allows the system to generate stylized relevant responses by sampling in the neighborhood of the conversation model prediction, and continuously control the style level. We demonstrate this method using dialogues from Reddit data and two sets of sentences with distinct styles (arXiv and Sherlock Holmes novels). Automatic and human evaluation show that, without sacrificing appropriateness, the system generates responses of the targeted style and outperforms competitive baselines.

* EMNLP 2019
* accepted to appear at EMNLP 2019 (long)

Via

Access Paper or Ask Questions