Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alex Xie

Identifying & Interactively Refining Ambiguous User Goals for Data Visualization Code Generation

Oct 10, 2025

Mert İnan, Anthony Sicilia, Alex Xie, Saujas Vaduguru, Daniel Fried, Malihe Alikhani

Abstract:Establishing shared goals is a fundamental step in human-AI communication. However, ambiguities can lead to outputs that seem correct but fail to reflect the speaker's intent. In this paper, we explore this issue with a focus on the data visualization domain, where ambiguities in natural language impact the generation of code that visualizes data. The availability of multiple views on the contextual (e.g., the intended plot and the code rendering the plot) allows for a unique and comprehensive analysis of diverse ambiguity types. We develop a taxonomy of types of ambiguity that arise in this task and propose metrics to quantify them. Using Matplotlib problems from the DS-1000 dataset, we demonstrate that our ambiguity metrics better correlate with human annotations than uncertainty baselines. Our work also explores how multi-turn dialogue can reduce ambiguity, therefore, improve code accuracy by better matching user goals. We evaluate three pragmatic models to inform our dialogue strategies: Gricean Cooperativity, Discourse Representation Theory, and Questions under Discussion. A simulated user study reveals how pragmatic dialogues reduce ambiguity and enhance code accuracy, highlighting the value of multi-turn exchanges in code generation.

Via

Access Paper or Ask Questions

RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing

Mar 10, 2025

Yiqing Xie, Alex Xie, Divyanshu Sheth, Pengfei Liu, Daniel Fried, Carolyn Rose

Figure 1 for RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing

Figure 2 for RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing

Figure 3 for RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing

Figure 4 for RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing

Abstract:We present RepoST, a scalable method to construct environments that provide execution feedback for repository-level code generation for both training and evaluation. Unlike existing works that aim to build entire repositories for execution, which is challenging for both human and LLMs, we provide execution feedback with sandbox testing, which isolates a given target function and its dependencies to a separate script for testing. Sandbox testing reduces the complexity of external dependencies and enables constructing environments at a large scale. We use our method to construct RepoST-Train, a large-scale train set with 7,415 functions from 832 repositories. Training with the execution feedback provided by RepoST-Train leads to a performance gain of 5.5% Pass@1 on HumanEval and 3.5% Pass@1 on RepoEval. We also build an evaluation dataset, RepoST-Eval, and benchmark 12 code generation models.

Via

Access Paper or Ask Questions

From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

Jun 24, 2024

Sean Welleck, Amanda Bertsch, Matthew Finlayson, Hailey Schoelkopf, Alex Xie, Graham Neubig, Ilia Kulikov, Zaid Harchaoui

Abstract:One of the most striking findings in modern research on large language models (LLMs) is that scaling up compute during training leads to better results. However, less attention has been given to the benefits of scaling compute during inference. This survey focuses on these inference-time approaches. We explore three areas under a unified mathematical formalism: token-level generation algorithms, meta-generation algorithms, and efficient generation. Token-level generation algorithms, often called decoding algorithms, operate by sampling a single token at a time or constructing a token-level search space and then selecting an output. These methods typically assume access to a language model's logits, next-token distributions, or probability scores. Meta-generation algorithms work on partial or full sequences, incorporating domain knowledge, enabling backtracking, and integrating external information. Efficient generation methods aim to reduce token costs and improve the speed of generation. Our survey unifies perspectives from three research communities: traditional natural language processing, modern LLMs, and machine learning systems.

Via

Access Paper or Ask Questions

CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks

Mar 31, 2024

Yiqing Xie, Alex Xie, Divyanshu Sheth, Pengfei Liu, Daniel Fried, Carolyn Rose

Figure 1 for CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks

Figure 2 for CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks

Figure 3 for CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks

Figure 4 for CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks

Abstract:To facilitate evaluation of code generation systems across diverse scenarios, we present CodeBenchGen, a framework to create scalable execution-based benchmarks that only requires light guidance from humans. Specifically, we leverage a large language model (LLM) to convert an arbitrary piece of code into an evaluation example, including test cases for execution-based evaluation. We illustrate the usefulness of our framework by creating a dataset, Exec-CSN, which includes 1,931 examples involving 293 libraries revised from code in 367 GitHub repositories taken from the CodeSearchNet dataset. To demonstrate the complexity and solvability of examples in Exec-CSN, we present a human study demonstrating that 81.3% of the examples can be solved by humans and 61% are rated as ``requires effort to solve''. We conduct code generation experiments on open-source and proprietary models and analyze the performance of both humans and models. We will release the code of both the framework and the dataset upon acceptance.

Via

Access Paper or Ask Questions

It's MBR All the Way Down: Modern Generation Techniques Through the Lens of Minimum Bayes Risk

Oct 02, 2023

Amanda Bertsch, Alex Xie, Graham Neubig, Matthew R. Gormley

Figure 1 for It's MBR All the Way Down: Modern Generation Techniques Through the Lens of Minimum Bayes Risk

Figure 2 for It's MBR All the Way Down: Modern Generation Techniques Through the Lens of Minimum Bayes Risk

Figure 3 for It's MBR All the Way Down: Modern Generation Techniques Through the Lens of Minimum Bayes Risk

Figure 4 for It's MBR All the Way Down: Modern Generation Techniques Through the Lens of Minimum Bayes Risk

Abstract:Minimum Bayes Risk (MBR) decoding is a method for choosing the outputs of a machine learning system based not on the output with the highest probability, but the output with the lowest risk (expected error) among multiple candidates. It is a simple but powerful method: for an additional cost at inference time, MBR provides reliable several-point improvements across metrics for a wide variety of tasks without any additional data or training. Despite this, MBR is not frequently applied in NLP works, and knowledge of the method itself is limited. We first provide an introduction to the method and the recent literature. We show that several recent methods that do not reference MBR can be written as special cases of MBR; this reformulation provides additional theoretical justification for the performance of these methods, explaining some results that were previously only empirical. We provide theoretical and empirical results about the effectiveness of various MBR variants and make concrete recommendations for the application of MBR in NLP models, including future directions in this area.

* Under submission

Via

Access Paper or Ask Questions