Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pratik Ringshia

Mephisto: A Framework for Portable, Reproducible, and Iterative Crowdsourcing

Jan 12, 2023

Jack Urbanek, Pratik Ringshia

Figure 1 for Mephisto: A Framework for Portable, Reproducible, and Iterative Crowdsourcing

Abstract:We introduce Mephisto, a framework to make crowdsourcing for research more reproducible, transparent, and collaborative. Mephisto provides abstractions that cover a broad set of task designs and data collection workflows, and provides a simple user experience to make best-practices easy defaults. In this whitepaper we discuss the current state of data collection and annotation in ML research, establish the motivation for building a shared framework to enable researchers to create and open-source data collection and annotation tools as part of their publication, and outline a set of suggested requirements for a system to facilitate these goals. We then step through our resolution in Mephisto, explaining the abstractions we use, our design decisions around the user experience, and share implementation details and where they align with the original motivations. We also discuss current limitations, as well as future work towards continuing to deliver on the framework's initial goals. Mephisto is available as an open source project, and its documentation can be found at www.mephisto.ai.

Via

Access Paper or Ask Questions

Dynabench: Rethinking Benchmarking in NLP

Apr 07, 2021

Douwe Kiela, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger, Zhengxuan Wu, Bertie Vidgen, Grusha Prasad, Amanpreet Singh, Pratik Ringshia(+9 more)

Figure 1 for Dynabench: Rethinking Benchmarking in NLP

Figure 2 for Dynabench: Rethinking Benchmarking in NLP

Figure 3 for Dynabench: Rethinking Benchmarking in NLP

Abstract:We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the-loop dataset creation: annotators seek to create examples that a target model will misclassify, but that another person will not. In this paper, we argue that Dynabench addresses a critical need in our community: contemporary models quickly achieve outstanding performance on benchmark tasks but nonetheless fail on simple challenge examples and falter in real-world scenarios. With Dynabench, dataset creation, model development, and model assessment can directly inform each other, leading to more robust and informative benchmarks. We report on four initial NLP tasks, illustrating these concepts and highlighting the promise of the platform, and address potential objections to dynamic benchmarking as a new standard for the field.

* NAACL 2021

Via

Access Paper or Ask Questions

Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions

Jul 13, 2020

Stephen Roller, Y-Lan Boureau, Jason Weston, Antoine Bordes, Emily Dinan, Angela Fan, David Gunning, Da Ju, Margaret Li, Spencer Poff(+6 more)

Abstract:We present our view of what is necessary to build an engaging open-domain conversational agent: covering the qualities of such an agent, the pieces of the puzzle that have been built so far, and the gaping holes we have not filled yet. We present a biased view, focusing on work done by our own group, while citing related work in each area. In particular, we discuss in detail the properties of continual learning, providing engaging content, and being well-behaved -- and how to measure success in providing them. We end with a discussion of our experience and learnings, and our recommendations to the community.

Via

Access Paper or Ask Questions

The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

Jun 08, 2020

Douwe Kiela, Hamed Firooz, Aravind Mohan, Vedanuj Goswami, Amanpreet Singh, Pratik Ringshia, Davide Testuggine

Figure 1 for The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

Figure 2 for The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

Figure 3 for The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

Figure 4 for The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes

Abstract:This work proposes a new challenge set for multimodal classification, focusing on detecting hate speech in multimodal memes. It is constructed such that unimodal models struggle and only multimodal models can succeed: difficult examples ("benign confounders") are added to the dataset to make it hard to rely on unimodal signals. The task requires subtle reasoning, yet is straightforward to evaluate as a binary classification problem. We provide baseline performance numbers for unimodal models, as well as for multimodal models with various degrees of sophistication. We find that state-of-the-art methods perform poorly compared to humans (64.73% vs. 84.7% accuracy), illustrating the difficulty of the task and highlighting the challenge that this important problem poses to the community.

Via

Access Paper or Ask Questions

Generating Interactive Worlds with Text

Dec 04, 2019

Angela Fan, Jack Urbanek, Pratik Ringshia, Emily Dinan, Emma Qian, Siddharth Karamcheti, Shrimai Prabhumoye, Douwe Kiela, Tim Rocktaschel, Arthur Szlam(+1 more)

Figure 1 for Generating Interactive Worlds with Text

Figure 2 for Generating Interactive Worlds with Text

Figure 3 for Generating Interactive Worlds with Text

Figure 4 for Generating Interactive Worlds with Text

Abstract:Procedurally generating cohesive and interesting game environments is challenging and time-consuming. In order for the relationships between the game elements to be natural, common-sense has to be encoded into arrangement of the elements. In this work, we investigate a machine learning approach for world creation using content from the multi-player text adventure game environment LIGHT. We introduce neural network based models to compositionally arrange locations, characters, and objects into a coherent whole. In addition to creating worlds based on existing elements, our models can generate new game content. Humans can also leverage our models to interactively aid in worldbuilding. We show that the game environments created with our approach are cohesive, diverse, and preferred by human evaluators compared to other machine learning based world construction algorithms.

Via

Access Paper or Ask Questions