Alert button
Picture for Chris Callison-Burch

Chris Callison-Burch

Alert button

Choice-75: A Dataset on Decision Branching in Script Learning

Sep 21, 2023
Zhaoyi Joey Hou, Li Zhang, Chris Callison-Burch

Script learning studies how daily events unfold. Previous works tend to consider a script as a linear sequence of events while ignoring the potential branches that arise due to people's circumstantial choices. We hence propose Choice-75, the first benchmark that challenges intelligent systems to predict decisions given descriptive scenarios, containing 75 scripts and more than 600 scenarios. While large language models demonstrate overall decent performances, there is still notable room for improvement in many hard scenarios.

Viaarxiv icon

Kani: A Lightweight and Highly Hackable Framework for Building Language Model Applications

Sep 11, 2023
Andrew Zhu, Liam Dugan, Alyssa Hwang, Chris Callison-Burch

Figure 1 for Kani: A Lightweight and Highly Hackable Framework for Building Language Model Applications
Figure 2 for Kani: A Lightweight and Highly Hackable Framework for Building Language Model Applications
Figure 3 for Kani: A Lightweight and Highly Hackable Framework for Building Language Model Applications
Figure 4 for Kani: A Lightweight and Highly Hackable Framework for Building Language Model Applications

Language model applications are becoming increasingly popular and complex, often including features like tool usage and retrieval augmentation. However, existing frameworks for such applications are often opinionated, deciding for developers how their prompts ought to be formatted and imposing limitations on customizability and reproducibility. To solve this we present Kani: a lightweight, flexible, and model-agnostic open-source framework for building language model applications. Kani helps developers implement a variety of complex features by supporting the core building blocks of chat interaction: model interfacing, chat management, and robust function calling. All Kani core functions are easily overridable and well documented to empower developers to customize functionality for their own needs. Kani thus serves as a useful tool for researchers, hobbyists, and industry professionals alike to accelerate their development while retaining interoperability and fine-grained control.

* In submission to NLP-OSS 
Viaarxiv icon

ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style Transfer

Sep 04, 2023
Zachary Horvitz, Ajay Patel, Chris Callison-Burch, Zhou Yu, Kathleen McKeown

Figure 1 for ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style Transfer
Figure 2 for ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style Transfer
Figure 3 for ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style Transfer
Figure 4 for ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style Transfer

Textual style transfer is the task of transforming stylistic properties of text while preserving meaning. Target "styles" can be defined in numerous ways, ranging from single attributes (e.g, formality) to authorship (e.g, Shakespeare). Previous unsupervised style-transfer approaches generally rely on significant amounts of labeled data for only a fixed set of styles or require large language models. In contrast, we introduce a novel diffusion-based framework for general-purpose style transfer that can be flexibly adapted to arbitrary target styles at inference time. Our parameter-efficient approach, ParaGuide, leverages paraphrase-conditioned diffusion models alongside gradient-based guidance from both off-the-shelf classifiers and strong existing style embedders to transform the style of text while preserving semantic information. We validate the method on the Enron Email Corpus, with both human and automatic evaluations, and find that it outperforms strong baselines on formality, sentiment, and even authorship style transfer.

Viaarxiv icon

CALYPSO: LLMs as Dungeon Masters' Assistants

Aug 15, 2023
Andrew Zhu, Lara J. Martin, Andrew Head, Chris Callison-Burch

Figure 1 for CALYPSO: LLMs as Dungeon Masters' Assistants
Figure 2 for CALYPSO: LLMs as Dungeon Masters' Assistants
Figure 3 for CALYPSO: LLMs as Dungeon Masters' Assistants
Figure 4 for CALYPSO: LLMs as Dungeon Masters' Assistants

The role of a Dungeon Master, or DM, in the game Dungeons & Dragons is to perform multiple tasks simultaneously. The DM must digest information about the game setting and monsters, synthesize scenes to present to other players, and respond to the players' interactions with the scene. Doing all of these tasks while maintaining consistency within the narrative and story world is no small feat of human cognition, making the task tiring and unapproachable to new players. Large language models (LLMs) like GPT-3 and ChatGPT have shown remarkable abilities to generate coherent natural language text. In this paper, we conduct a formative evaluation with DMs to establish the use cases of LLMs in D&D and tabletop gaming generally. We introduce CALYPSO, a system of LLM-powered interfaces that support DMs with information and inspiration specific to their own scenario. CALYPSO distills game context into bite-sized prose and helps brainstorm ideas without distracting the DM from the game. When given access to CALYPSO, DMs reported that it generated high-fidelity text suitable for direct presentation to players, and low-fidelity ideas that the DM could develop further while maintaining their creative agency. We see CALYPSO as exemplifying a paradigm of AI-augmented tools that provide synchronous creative assistance within established game worlds, and tabletop gaming more broadly.

* 11 pages, 4 figures. AIIDE 2023 
Viaarxiv icon

Open-Domain Hierarchical Event Schema Induction by Incremental Prompting and Verification

Jul 05, 2023
Sha Li, Ruining Zhao, Manling Li, Heng Ji, Chris Callison-Burch, Jiawei Han

Figure 1 for Open-Domain Hierarchical Event Schema Induction by Incremental Prompting and Verification
Figure 2 for Open-Domain Hierarchical Event Schema Induction by Incremental Prompting and Verification
Figure 3 for Open-Domain Hierarchical Event Schema Induction by Incremental Prompting and Verification
Figure 4 for Open-Domain Hierarchical Event Schema Induction by Incremental Prompting and Verification

Event schemas are a form of world knowledge about the typical progression of events. Recent methods for event schema induction use information extraction systems to construct a large number of event graph instances from documents, and then learn to generalize the schema from such instances. In contrast, we propose to treat event schemas as a form of commonsense knowledge that can be derived from large language models (LLMs). This new paradigm greatly simplifies the schema induction process and allows us to handle both hierarchical relations and temporal relations between events in a straightforward way. Since event schemas have complex graph structures, we design an incremental prompting and verification method to break down the construction of a complex event graph into three stages: event skeleton construction, event expansion, and event-event relation verification. Compared to directly using LLMs to generate a linearized graph, our method can generate large and complex schemas with 7.2% F1 improvement in temporal relations and 31.0% F1 improvement in hierarchical relations. In addition, compared to the previous state-of-the-art closed-domain schema induction model, human assessors were able to cover $\sim$10% more events when translating the schemas into coherent stories and rated our schemas 1.3 points higher (on a 5-point scale) in terms of readability.

* Accepted to ACL 2023. 19 pages with appendix 
Viaarxiv icon

Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models

Jun 01, 2023
Liam Dugan, Anshul Wadhawan, Kyle Spence, Chris Callison-Burch, Morgan McGuire, Victor Zordan

Figure 1 for Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models
Figure 2 for Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation with Offline Models

Recent work in speech-to-speech translation (S2ST) has focused primarily on offline settings, where the full input utterance is available before any output is given. This, however, is not reasonable in many real-world scenarios. In latency-sensitive applications, rather than waiting for the full utterance, translations should be spoken as soon as the information in the input is present. In this work, we introduce a system for simultaneous S2ST targeting real-world use cases. Our system supports translation from 57 languages to English with tunable parameters for dynamically adjusting the latency of the output -- including four policies for determining when to speak an output sequence. We show that these policies achieve offline-level accuracy with minimal increases in latency over a Greedy (wait-$k$) baseline. We open-source our evaluation code and interactive test script to aid future SimulS2ST research and application development.

* To appear at INTERSPEECH 2023 
Viaarxiv icon

Representation Of Lexical Stylistic Features In Language Models' Embedding Space

May 31, 2023
Qing Lyu, Marianna Apidianaki, Chris Callison-Burch

Figure 1 for Representation Of Lexical Stylistic Features In Language Models' Embedding Space
Figure 2 for Representation Of Lexical Stylistic Features In Language Models' Embedding Space
Figure 3 for Representation Of Lexical Stylistic Features In Language Models' Embedding Space
Figure 4 for Representation Of Lexical Stylistic Features In Language Models' Embedding Space

The representation space of pretrained Language Models (LMs) encodes rich information about words and their relationships (e.g., similarity, hypernymy, polysemy) as well as abstract semantic notions (e.g., intensity). In this paper, we demonstrate that lexical stylistic notions such as complexity, formality, and figurativeness, can also be identified in this space. We show that it is possible to derive a vector representation for each of these stylistic notions from only a small number of seed pairs. Using these vectors, we can characterize new texts in terms of these dimensions by performing simple calculations in the corresponding embedding space. We conduct experiments on five datasets and find that static embeddings encode these features more accurately at the level of words and phrases, whereas contextualized LMs perform better on sentences. The lower performance of contextualized representations at the word level is partially attributable to the anisotropy of their vector space, which can be corrected to some extent using techniques like standardization.

* Accepted at *SEM 2023 
Viaarxiv icon

This Land is {Your, My} Land: Evaluating Geopolitical Biases in Language Models

May 24, 2023
Bryan Li, Chris Callison-Burch

Figure 1 for This Land is {Your, My} Land: Evaluating Geopolitical Biases in Language Models
Figure 2 for This Land is {Your, My} Land: Evaluating Geopolitical Biases in Language Models
Figure 3 for This Land is {Your, My} Land: Evaluating Geopolitical Biases in Language Models
Figure 4 for This Land is {Your, My} Land: Evaluating Geopolitical Biases in Language Models

We introduce the notion of geopolitical bias -- a tendency to report different geopolitical knowledge depending on the linguistic context. As a case study, we consider territorial disputes between countries. For example, for the widely contested Spratly Islands, would an LM be more likely to say they belong to China if asked in Chinese, vs. to the Philippines if asked in Tagalog? To evaluate if such biases exist, we first collect a dataset of territorial disputes from Wikipedia, then associate each territory with a set of multilingual, multiple-choice questions. This dataset, termed BorderLines, consists of 250 territories with questions in 45 languages. We pose these question sets to language models, and analyze geopolitical bias in their responses through several proposed quantitative metrics. The metrics compare between responses in different question languages as well as to the actual geopolitical situation. The phenomenon of geopolitical bias is a uniquely cross-lingual evaluation, contrasting with prior work's monolingual (mostly English) focus on bias evaluation. Its existence shows that the knowledge of LMs, unlike multilingual humans, is inconsistent across languages.

Viaarxiv icon

OpenPI2.0: An Improved Dataset for Entity Tracking in Texts

May 24, 2023
Li Zhang, Hainiu Xu, Abhinav Kommula, Niket Tandon, Chris Callison-Burch

Figure 1 for OpenPI2.0: An Improved Dataset for Entity Tracking in Texts
Figure 2 for OpenPI2.0: An Improved Dataset for Entity Tracking in Texts
Figure 3 for OpenPI2.0: An Improved Dataset for Entity Tracking in Texts
Figure 4 for OpenPI2.0: An Improved Dataset for Entity Tracking in Texts

Representing texts as information about entities has long been deemed effective in event reasoning. We propose OpenPI2.0, an improved dataset for tracking entity states in procedural texts. OpenPI2.0 features not only canonicalized entities that facilitate evaluation, but also salience annotations including both manual labels and automatic predictions. Regarding entity salience, we provide a survey on annotation subjectivity, modeling feasibility, and downstream applications in tasks such as question answering and classical planning.

Viaarxiv icon