Alert button
Picture for Andy Coenen

Andy Coenen

Alert button

Creative Writing with an AI-Powered Writing Assistant: Perspectives from Professional Writers

Nov 09, 2022
Daphne Ippolito, Ann Yuan, Andy Coenen, Sehmon Burnam

Figure 1 for Creative Writing with an AI-Powered Writing Assistant: Perspectives from Professional Writers
Figure 2 for Creative Writing with an AI-Powered Writing Assistant: Perspectives from Professional Writers
Figure 3 for Creative Writing with an AI-Powered Writing Assistant: Perspectives from Professional Writers
Figure 4 for Creative Writing with an AI-Powered Writing Assistant: Perspectives from Professional Writers

Recent developments in natural language generation (NLG) using neural language models have brought us closer than ever to the goal of building AI-powered creative writing tools. However, most prior work on human-AI collaboration in the creative writing domain has evaluated new systems with amateur writers, typically in contrived user studies of limited scope. In this work, we commissioned 13 professional, published writers from a diverse set of creative writing backgrounds to craft stories using Wordcraft, a text editor with built-in AI-powered writing assistance tools. Using interviews and participant journals, we discuss the potential of NLG to have significant impact in the creative writing domain--especially with respect to brainstorming, generation of story details, world-building, and research assistance. Experienced writers, more so than amateurs, typically have well-developed systems and methodologies for writing, as well as distinctive voices and target audiences. Our work highlights the challenges in building for these writers; NLG technologies struggle to preserve style and authorial voice, and they lack deep understanding of story contents. In order for AI-powered writing assistants to realize their full potential, it is essential that they take into account the diverse goals and expertise of human writers.

Viaarxiv icon

The Case for a Single Model that can Both Generate Continuations and Fill in the Blank

Jun 09, 2022
Daphne Ippolito, Liam Dugan, Emily Reif, Ann Yuan, Andy Coenen, Chris Callison-Burch

Figure 1 for The Case for a Single Model that can Both Generate Continuations and Fill in the Blank
Figure 2 for The Case for a Single Model that can Both Generate Continuations and Fill in the Blank
Figure 3 for The Case for a Single Model that can Both Generate Continuations and Fill in the Blank
Figure 4 for The Case for a Single Model that can Both Generate Continuations and Fill in the Blank

The task of inserting text into a specified position in a passage, known as fill in the blank (FitB), is useful for a variety of applications where writers interact with a natural language generation (NLG) system to craft text. While previous work has tackled this problem with models trained specifically to do the fill-in-the-blank task, a more useful model is one that can effectively perform _both_ FitB and continuation. In this work, we evaluate the feasibility of using a single model to do both tasks. We show that models pre-trained with a FitB-style objective are capable of both tasks, while models pre-trained for continuation are not. Finally, we show how FitB models can be easily finetuned to allow for fine-grained control over the length and word choice of the generation.

* NAACL 2022 Findings  
Viaarxiv icon

SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets

Nov 11, 2021
Ann Yuan, Daphne Ippolito, Vitaly Nikolaev, Chris Callison-Burch, Andy Coenen, Sebastian Gehrmann

Figure 1 for SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets
Figure 2 for SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets
Figure 3 for SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets
Figure 4 for SynthBio: A Case Study in Human-AI Collaborative Curation of Text Datasets

NLP researchers need more, higher-quality text datasets. Human-labeled datasets are expensive to collect, while datasets collected via automatic retrieval from the web such as WikiBio are noisy and can include undesired biases. Moreover, data sourced from the web is often included in datasets used to pretrain models, leading to inadvertent cross-contamination of training and test sets. In this work we introduce a novel method for efficient dataset curation: we use a large language model to provide seed generations to human raters, thereby changing dataset authoring from a writing task to an editing task. We use our method to curate SynthBio - a new evaluation set for WikiBio - composed of structured attribute lists describing fictional individuals, mapped to natural language biographies. We show that our dataset of fictional biographies is less noisy than WikiBio, and also more balanced with respect to gender and nationality.

* 10 pages, 2 figures, accepted to NeurIPS 2021 Datasets and Benchmarks Track 
Viaarxiv icon

A Recipe For Arbitrary Text Style Transfer with Large Language Models

Sep 16, 2021
Emily Reif, Daphne Ippolito, Ann Yuan, Andy Coenen, Chris Callison-Burch, Jason Wei

Figure 1 for A Recipe For Arbitrary Text Style Transfer with Large Language Models
Figure 2 for A Recipe For Arbitrary Text Style Transfer with Large Language Models
Figure 3 for A Recipe For Arbitrary Text Style Transfer with Large Language Models
Figure 4 for A Recipe For Arbitrary Text Style Transfer with Large Language Models

In this paper, we leverage large language models (LMs) to perform zero-shot text style transfer. We present a prompting method that we call augmented zero-shot learning, which frames style transfer as a sentence rewriting task and requires only a natural language instruction, without model fine-tuning or exemplars in the target style. Augmented zero-shot learning is simple and demonstrates promising results not just on standard style transfer tasks such as sentiment, but also on arbitrary transformations such as "make this melodramatic" or "insert a metaphor."

Viaarxiv icon

Wordcraft: a Human-AI Collaborative Editor for Story Writing

Jul 15, 2021
Andy Coenen, Luke Davis, Daphne Ippolito, Emily Reif, Ann Yuan

Figure 1 for Wordcraft: a Human-AI Collaborative Editor for Story Writing
Figure 2 for Wordcraft: a Human-AI Collaborative Editor for Story Writing
Figure 3 for Wordcraft: a Human-AI Collaborative Editor for Story Writing

As neural language models grow in effectiveness, they are increasingly being applied in real-world settings. However these applications tend to be limited in the modes of interaction they support. In this extended abstract, we propose Wordcraft, an AI-assisted editor for story writing in which a writer and a dialog system collaborate to write a story. Our novel interface uses few-shot learning and the natural affordances of conversation to support a variety of interactions. Our editor provides a sandbox for writers to probe the boundaries of transformer-based language models and paves the way for future human-in-the-loop training pipelines and novel evaluation methods.

* First Workshop on Bridging Human-Computer Interaction and Natural Language Processing at EACL 2021  
Viaarxiv icon

An Interpretability Illusion for BERT

Apr 14, 2021
Tolga Bolukbasi, Adam Pearce, Ann Yuan, Andy Coenen, Emily Reif, Fernanda Viégas, Martin Wattenberg

Figure 1 for An Interpretability Illusion for BERT
Figure 2 for An Interpretability Illusion for BERT
Figure 3 for An Interpretability Illusion for BERT
Figure 4 for An Interpretability Illusion for BERT

We describe an "interpretability illusion" that arises when analyzing the BERT model. Activations of individual neurons in the network may spuriously appear to encode a single, simple concept, when in fact they are encoding something far more complex. The same effect holds for linear combinations of activations. We trace the source of this illusion to geometric properties of BERT's embedding space as well as the fact that common text corpora represent only narrow slices of possible English sentences. We provide a taxonomy of model-learned concepts and discuss methodological implications for interpretability research, especially the importance of testing hypotheses on multiple data sets.

Viaarxiv icon

The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models

Aug 12, 2020
Ian Tenney, James Wexler, Jasmijn Bastings, Tolga Bolukbasi, Andy Coenen, Sebastian Gehrmann, Ellen Jiang, Mahima Pushkarna, Carey Radebaugh, Emily Reif, Ann Yuan

Figure 1 for The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models
Figure 2 for The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models
Figure 3 for The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models
Figure 4 for The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models

We present the Language Interpretability Tool (LIT), an open-source platform for visualization and understanding of NLP models. We focus on core questions about model behavior: Why did my model make this prediction? When does it perform poorly? What happens under a controlled change in the input? LIT integrates local explanations, aggregate analysis, and counterfactual generation into a streamlined, browser-based interface to enable rapid exploration and error analysis. We include case studies for a diverse set of workflows, including exploring counterfactuals for sentiment analysis, measuring gender bias in coreference systems, and exploring local behavior in text generation. LIT supports a wide range of models--including classification, seq2seq, and structured prediction--and is highly extensible through a declarative, framework-agnostic API. LIT is under active development, with code and full documentation available at https://github.com/pair-code/lit.

Viaarxiv icon

Visualizing and Measuring the Geometry of BERT

Jun 06, 2019
Andy Coenen, Emily Reif, Ann Yuan, Been Kim, Adam Pearce, Fernanda Viégas, Martin Wattenberg

Figure 1 for Visualizing and Measuring the Geometry of BERT
Figure 2 for Visualizing and Measuring the Geometry of BERT
Figure 3 for Visualizing and Measuring the Geometry of BERT
Figure 4 for Visualizing and Measuring the Geometry of BERT

Transformer architectures show significant promise for natural language processing. Given that a single pretrained model can be fine-tuned to perform well on many different tasks, these networks appear to extract generally useful linguistic features. A natural question is how such networks represent this information internally. This paper describes qualitative and quantitative investigations of one particularly effective model, BERT. At a high level, linguistic features seem to be represented in separate semantic and syntactic subspaces. We find evidence of a fine-grained geometric representation of word senses. We also present empirical descriptions of syntactic representations in both attention matrices and individual word embeddings, as well as a mathematical argument to explain the geometry of these representations.

* 8 pages, 5 figures 
Viaarxiv icon