Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Douwe Kiela

Exploring Zero-Shot Emergent Communication in Embodied Multi-Agent Populations

Oct 29, 2020
Kalesha Bullard, Franziska Meier, Douwe Kiela, Joelle Pineau, Jakob Foerster

Figure 1 for Exploring Zero-Shot Emergent Communication in Embodied Multi-Agent Populations

Figure 2 for Exploring Zero-Shot Emergent Communication in Embodied Multi-Agent Populations

Figure 3 for Exploring Zero-Shot Emergent Communication in Embodied Multi-Agent Populations

Figure 4 for Exploring Zero-Shot Emergent Communication in Embodied Multi-Agent Populations

Effective communication is an important skill for enabling information exchange and cooperation in multi-agent settings. Indeed, emergent communication is now a vibrant field of research, with common settings involving discrete cheap-talk channels. One limitation of this setting is that it does not allow for the emergent protocols to generalize beyond the training partners. Furthermore, so far emergent communication has primarily focused on the use of symbolic channels. In this work, we extend this line of work to a new modality, by studying agents that learn to communicate via actuating their joints in a 3D environment. We show that under realistic assumptions, a non-uniform distribution of intents and a common-knowledge energy cost, these agents can find protocols that generalize to novel partners. We also explore and analyze specific difficulties associated with finding these solutions in practice. Finally, we propose and evaluate initial training improvements to address these challenges, involving both specific training curricula and providing the latent feature that can be coordinated on during training.

Via

Access Paper or Ask Questions

ANLIzing the Adversarial Natural Language Inference Dataset

Oct 24, 2020
Adina Williams, Tristan Thrush, Douwe Kiela

Figure 1 for ANLIzing the Adversarial Natural Language Inference Dataset

Figure 2 for ANLIzing the Adversarial Natural Language Inference Dataset

Figure 3 for ANLIzing the Adversarial Natural Language Inference Dataset

Figure 4 for ANLIzing the Adversarial Natural Language Inference Dataset

We perform an in-depth error analysis of Adversarial NLI (ANLI), a recently introduced large-scale human-and-model-in-the-loop natural language inference dataset collected over multiple rounds. We propose a fine-grained annotation scheme of the different aspects of inference that are responsible for the gold classification labels, and use it to hand-code all three of the ANLI development sets. We use these annotations to answer a variety of interesting questions: which inference types are most common, which models have the highest performance on each reasoning type, and which types are the most challenging for state of-the-art models? We hope that our annotations will enable more fine-grained evaluation of models trained on ANLI, provide us with a deeper understanding of where models fail and succeed, and help us determine how to train better models in future.

* 33 pages, 1 figure, 24 tables

Via

Access Paper or Ask Questions

Learning Optimal Representations with the Decodable Information Bottleneck

Sep 27, 2020
Yann Dubois, Douwe Kiela, David J. Schwab, Ramakrishna Vedantam

Figure 1 for Learning Optimal Representations with the Decodable Information Bottleneck

Figure 2 for Learning Optimal Representations with the Decodable Information Bottleneck

Figure 3 for Learning Optimal Representations with the Decodable Information Bottleneck

Figure 4 for Learning Optimal Representations with the Decodable Information Bottleneck

We address the question of characterizing and finding optimal representations for supervised learning. Traditionally, this question has been tackled using the Information Bottleneck, which compresses the inputs while retaining information about the targets, in a decoder-agnostic fashion. In machine learning, however, our goal is not compression but rather generalization, which is intimately linked to the predictive family or decoder of interest (e.g. linear classifier). We propose the Decodable Information Bottleneck (DIB) that considers information retention and compression from the perspective of the desired predictive family. As a result, DIB gives rise to representations that are optimal in terms of expected test performance and can be estimated with guarantees. Empirically, we show that the framework can be used to enforce a small generalization gap on downstream classifiers and to predict the generalization ability of neural networks.

* Accepted at NeurIPS 2020

Via

Access Paper or Ask Questions

Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval

Sep 27, 2020
Wenhan Xiong, Xiang Lorraine Li, Srini Iyer, Jingfei Du, Patrick Lewis, William Yang Wang, Yashar Mehdad, Wen-tau Yih, Sebastian Riedel, Douwe Kiela, Barlas Oğuz

Figure 1 for Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval

Figure 2 for Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval

Figure 3 for Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval

Figure 4 for Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval

We propose a simple and efficient multi-hop dense retrieval approach for answering complex open-domain questions, which achieves state-of-the-art performance on two multi-hop datasets, HotpotQA and multi-evidence FEVER. Contrary to previous work, our method does not require access to any corpus-specific information, such as inter-document hyperlinks or human-annotated entity markers, and can be applied to any unstructured text corpus. Our system also yields a much better efficiency-accuracy trade-off, matching the best published accuracy on HotpotQA while being 10 times faster at inference time.

Via

Access Paper or Ask Questions

The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

Jun 08, 2020
Douwe Kiela, Hamed Firooz, Aravind Mohan, Vedanuj Goswami, Amanpreet Singh, Pratik Ringshia, Davide Testuggine

Figure 1 for The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

Figure 2 for The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

Figure 3 for The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

Figure 4 for The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

This work proposes a new challenge set for multimodal classification, focusing on detecting hate speech in multimodal memes. It is constructed such that unimodal models struggle and only multimodal models can succeed: difficult examples ("benign confounders") are added to the dataset to make it hard to rely on unimodal signals. The task requires subtle reasoning, yet is straightforward to evaluate as a binary classification problem. We provide baseline performance numbers for unimodal models, as well as for multimodal models with various degrees of sophistication. We find that state-of-the-art methods perform poorly compared to humans (64.73% vs. 84.7% accuracy), illustrating the difficulty of the task and highlighting the challenge that this important problem poses to the community.

Via

Access Paper or Ask Questions

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

May 22, 2020
Patrick Lewis, Ethan Perez, Aleksandara Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela

Figure 1 for Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Figure 2 for Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Figure 3 for Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Figure 4 for Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures. Additionally, providing provenance for their decisions and updating their world knowledge remain open research problems. Pre-trained models with a differentiable access mechanism to explicit non-parametric memory can overcome this issue, but have so far been only investigated for extractive downstream tasks. We explore a general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) -- models which combine pre-trained parametric and non-parametric memory for language generation. We introduce RAG models where the parametric memory is a pre-trained seq2seq model and the non-parametric memory is a dense vector index of Wikipedia, accessed with a pre-trained neural retriever. We compare two RAG formulations, one which conditions on the same retrieved passages across the whole generated sequence, the other can use different passages per token. We fine-tune and evaluate our models on a wide range of knowledge-intensive NLP tasks and set the state-of-the-art on three open domain QA tasks, outperforming parametric seq2seq models and task-specific retrieve-and-extract architectures. For language generation tasks, we find that RAG models generate more specific, diverse and factual language than a state-of-the-art parametric-only seq2seq baseline.

Via

Access Paper or Ask Questions

Multi-Dimensional Gender Bias Classification

May 01, 2020
Emily Dinan, Angela Fan, Ledell Wu, Jason Weston, Douwe Kiela, Adina Williams

Figure 1 for Multi-Dimensional Gender Bias Classification

Figure 2 for Multi-Dimensional Gender Bias Classification

Figure 3 for Multi-Dimensional Gender Bias Classification

Figure 4 for Multi-Dimensional Gender Bias Classification

Machine learning models are trained to find patterns in data. NLP models can inadvertently learn socially undesirable patterns when training on gender biased text. In this work, we propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions: bias from the gender of the person being spoken about, bias from the gender of the person being spoken to, and bias from the gender of the speaker. Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information. In addition, we collect a novel, crowdsourced evaluation benchmark of utterance-level gender rewrites. Distinguishing between gender bias along multiple dimensions is important, as it enables us to train finer-grained gender bias classifiers. We show our classifiers prove valuable for a variety of important applications, such as controlling for gender bias in generative models, detecting gender bias in arbitrary text, and shed light on offensive language in terms of genderedness.

Via

Access Paper or Ask Questions

Unsupervised Question Decomposition for Question Answering

Feb 22, 2020
Ethan Perez, Patrick Lewis, Wen-tau Yih, Kyunghyun Cho, Douwe Kiela

Figure 1 for Unsupervised Question Decomposition for Question Answering

Figure 2 for Unsupervised Question Decomposition for Question Answering

Figure 3 for Unsupervised Question Decomposition for Question Answering

Figure 4 for Unsupervised Question Decomposition for Question Answering

We aim to improve question answering (QA) by decomposing hard questions into easier sub-questions that existing QA systems can answer. Since collecting labeled decompositions is cumbersome, we propose an unsupervised approach to produce sub-questions. Specifically, by leveraging >10M questions from Common Crawl, we learn to map from the distribution of multi-hop questions to the distribution of single-hop sub-questions. We answer sub-questions with an off-the-shelf QA model and incorporate the resulting answers in a downstream, multi-hop QA system. On a popular multi-hop QA dataset, HotpotQA, we show large improvements over a strong baseline, especially on adversarial and out-of-domain questions. Our method is generally applicable and automatically learns to decompose questions of different classes, while matching the performance of decomposition methods that rely heavily on hand-engineering and annotation.

Via

Access Paper or Ask Questions

I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents

Feb 10, 2020
Shrimai Prabhumoye, Margaret Li, Jack Urbanek, Emily Dinan, Douwe Kiela, Jason Weston, Arthur Szlam

Figure 1 for I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents

Figure 2 for I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents

Figure 3 for I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents

Figure 4 for I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents

Dialogue research tends to distinguish between chit-chat and goal-oriented tasks. While the former is arguably more naturalistic and has a wider use of language, the latter has clearer metrics and a straightforward learning signal. Humans effortlessly combine the two, for example engaging in chit-chat with the goal of exchanging information or eliciting a specific response. Here, we bridge the divide between these two domains in the setting of a rich multi-player text-based fantasy environment where agents and humans engage in both actions and dialogue. Specifically, we train a goal-oriented model with reinforcement learning against an imitation-learned ``chit-chat'' model with two approaches: the policy either learns to pick a topic or learns to pick an utterance given the top-K utterances from the chit-chat model. We show that both models outperform an inverse model baseline and can converse naturally with their dialogue partner in order to achieve goals.

Via

Access Paper or Ask Questions