Alert button
Picture for Maxwell Nye

Maxwell Nye

Alert button

Language Modeling with Latent Situations

Dec 20, 2022
Belinda Z. Li, Maxwell Nye, Jacob Andreas

Figure 1 for Language Modeling with Latent Situations
Figure 2 for Language Modeling with Latent Situations
Figure 3 for Language Modeling with Latent Situations
Figure 4 for Language Modeling with Latent Situations

Language models (LMs) often generate incoherent outputs: they refer to events and entity states that are incompatible with the state of the world described in their inputs. We introduce SituationSupervision, a family of approaches for improving coherence in LMs by training them to construct and condition on explicit representations of entities and their states. SituationSupervision has two components: an auxiliary situation modeling task that trains models to predict state representations in context, and a latent state inference procedure that imputes these states from partially annotated training data. SituationSupervision can be applied to both fine-tuning (by supervising LMs to encode state variables in their hidden representations) and prompting (by inducing LMs to interleave textual descriptions of entity states with output text). In both cases, SituationSupervision requires only a small number of state annotations to produce major coherence improvements (between 4-11%), showing that standard LMs can be sample-efficiently trained to model not just language but the situations it describes.

* 13 pages, 3 figures, 7 tables 
Viaarxiv icon

Show Your Work: Scratchpads for Intermediate Computation with Language Models

Nov 30, 2021
Maxwell Nye, Anders Johan Andreassen, Guy Gur-Ari, Henryk Michalewski, Jacob Austin, David Bieber, David Dohan, Aitor Lewkowycz, Maarten Bosma, David Luan, Charles Sutton, Augustus Odena

Figure 1 for Show Your Work: Scratchpads for Intermediate Computation with Language Models
Figure 2 for Show Your Work: Scratchpads for Intermediate Computation with Language Models
Figure 3 for Show Your Work: Scratchpads for Intermediate Computation with Language Models
Figure 4 for Show Your Work: Scratchpads for Intermediate Computation with Language Models

Large pre-trained language models perform remarkably well on tasks that can be done "in one pass", such as generating realistic text or synthesizing computer programs. However, they struggle with tasks that require unbounded multi-step computation, such as adding integers or executing programs. Surprisingly, we find that these same models are able to perform complex multi-step computations -- even in the few-shot regime -- when asked to perform the operation "step by step", showing the results of intermediate computations. In particular, we train transformers to perform multi-step computations by asking them to emit intermediate computation steps into a "scratchpad". On a series of increasingly complex tasks ranging from long addition to the execution of arbitrary programs, we show that scratchpads dramatically improve the ability of language models to perform multi-step computations.

Viaarxiv icon

Program Synthesis with Large Language Models

Aug 16, 2021
Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, Charles Sutton

Figure 1 for Program Synthesis with Large Language Models
Figure 2 for Program Synthesis with Large Language Models
Figure 3 for Program Synthesis with Large Language Models
Figure 4 for Program Synthesis with Large Language Models

This paper explores the limits of the current generation of large language models for program synthesis in general purpose programming languages. We evaluate a collection of such models (with between 244M and 137B parameters) on two new benchmarks, MBPP and MathQA-Python, in both the few-shot and fine-tuning regimes. Our benchmarks are designed to measure the ability of these models to synthesize short Python programs from natural language descriptions. The Mostly Basic Programming Problems (MBPP) dataset contains 974 programming tasks, designed to be solvable by entry-level programmers. The MathQA-Python dataset, a Python version of the MathQA benchmark, contains 23914 problems that evaluate the ability of the models to synthesize code from more complex text. On both datasets, we find that synthesis performance scales log-linearly with model size. Our largest models, even without finetuning on a code dataset, can synthesize solutions to 59.6 percent of the problems from MBPP using few-shot learning with a well-designed prompt. Fine-tuning on a held-out portion of the dataset improves performance by about 10 percentage points across most model sizes. On the MathQA-Python dataset, the largest fine-tuned model achieves 83.8 percent accuracy. Going further, we study the model's ability to engage in dialog about code, incorporating human feedback to improve its solutions. We find that natural language feedback from a human halves the error rate compared to the model's initial prediction. Additionally, we conduct an error analysis to shed light on where these models fall short and what types of programs are most difficult to generate. Finally, we explore the semantic grounding of these models by fine-tuning them to predict the results of program execution. We find that even our best models are generally unable to predict the output of a program given a specific input.

* Jacob and Augustus contributed equally 
Viaarxiv icon

Improving Coherence and Consistency in Neural Sequence Models with Dual-System, Neuro-Symbolic Reasoning

Jul 06, 2021
Maxwell Nye, Michael Henry Tessler, Joshua B. Tenenbaum, Brenden M. Lake

Figure 1 for Improving Coherence and Consistency in Neural Sequence Models with Dual-System, Neuro-Symbolic Reasoning
Figure 2 for Improving Coherence and Consistency in Neural Sequence Models with Dual-System, Neuro-Symbolic Reasoning
Figure 3 for Improving Coherence and Consistency in Neural Sequence Models with Dual-System, Neuro-Symbolic Reasoning
Figure 4 for Improving Coherence and Consistency in Neural Sequence Models with Dual-System, Neuro-Symbolic Reasoning

Human reasoning can often be understood as an interplay between two systems: the intuitive and associative ("System 1") and the deliberative and logical ("System 2"). Neural sequence models -- which have been increasingly successful at performing complex, structured tasks -- exhibit the advantages and failure modes of System 1: they are fast and learn patterns from data, but are often inconsistent and incoherent. In this work, we seek a lightweight, training-free means of improving existing System 1-like sequence models by adding System 2-inspired logical reasoning. We explore several variations on this theme in which candidate generations from a neural sequence model are examined for logical consistency by a symbolic reasoning module, which can either accept or reject the generations. Our approach uses neural inference to mediate between the neural System 1 and the logical System 2. Results in robust story generation and grounded instruction-following show that this approach can increase the coherence and accuracy of neurally-based generations.

Viaarxiv icon

Communicating Natural Programs to Humans and Machines

Jun 15, 2021
Samuel Acquaviva, Yewen Pu, Marta Kryven, Catherine Wong, Gabrielle E Ecanow, Maxwell Nye, Theodoros Sechopoulos, Michael Henry Tessler, Joshua B. Tenenbaum

Figure 1 for Communicating Natural Programs to Humans and Machines
Figure 2 for Communicating Natural Programs to Humans and Machines
Figure 3 for Communicating Natural Programs to Humans and Machines
Figure 4 for Communicating Natural Programs to Humans and Machines

The Abstraction and Reasoning Corpus (ARC) is a set of tasks that tests an agent's ability to flexibly solve novel problems. While most ARC tasks are easy for humans, they are challenging for state-of-the-art AI. How do we build intelligent systems that can generalize to novel situations and understand human instructions in domains such as ARC? We posit that the answer may be found by studying how humans communicate to each other in solving these tasks. We present LARC, the Language-annotated ARC: a collection of natural language descriptions by a group of human participants, unfamiliar both with ARC and with each other, who instruct each other on how to solve ARC tasks. LARC contains successful instructions for 88\% of the ARC tasks. We analyze the collected instructions as `natural programs', finding that most natural program concepts have analogies in typical computer programs. However, unlike how one precisely programs a computer, we find that humans both anticipate and exploit ambiguities to communicate effectively. We demonstrate that a state-of-the-art program synthesis technique, which leverages the additional language annotations, outperforms its language-free counterpart.

* equal contributions: (author 3, 4), (author 5,6,7) 
Viaarxiv icon

Implicit Representations of Meaning in Neural Language Models

Jun 01, 2021
Belinda Z. Li, Maxwell Nye, Jacob Andreas

Figure 1 for Implicit Representations of Meaning in Neural Language Models
Figure 2 for Implicit Representations of Meaning in Neural Language Models
Figure 3 for Implicit Representations of Meaning in Neural Language Models
Figure 4 for Implicit Representations of Meaning in Neural Language Models

Does the effectiveness of neural language models derive entirely from accurate modeling of surface word co-occurrence statistics, or do these models represent and reason about the world they describe? In BART and T5 transformer language models, we identify contextual word representations that function as models of entities and situations as they evolve throughout a discourse. These neural representations have functional similarities to linguistic models of dynamic semantics: they support a linear readout of each entity's current properties and relations, and can be manipulated with predictable effects on language generation. Our results indicate that prediction in pretrained neural language models is supported, at least in part, by dynamic representations of meaning and implicit simulation of entity state, and that this behavior can be learned with only text as training data. Code and data are available at https://github.com/belindal/state-probes .

* 15 pages, 6 figures; accepted to ACL 2021 
Viaarxiv icon

Representing Partial Programs with Blended Abstract Semantics

Dec 23, 2020
Maxwell Nye, Yewen Pu, Matthew Bowers, Jacob Andreas, Joshua B. Tenenbaum, Armando Solar-Lezama

Figure 1 for Representing Partial Programs with Blended Abstract Semantics
Figure 2 for Representing Partial Programs with Blended Abstract Semantics
Figure 3 for Representing Partial Programs with Blended Abstract Semantics
Figure 4 for Representing Partial Programs with Blended Abstract Semantics

Synthesizing programs from examples requires searching over a vast, combinatorial space of possible programs. In this search process, a key challenge is representing the behavior of a partially written program before it can be executed, to judge if it is on the right track and predict where to search next. We introduce a general technique for representing partially written programs in a program synthesis engine. We take inspiration from the technique of abstract interpretation, in which an approximate execution model is used to determine if an unfinished program will eventually satisfy a goal specification. Here we learn an approximate execution model implemented as a modular neural network. By constructing compositional program representations that implicitly encode the interpretation semantics of the underlying programming language, we can represent partial programs using a flexible combination of concrete execution state and learned neural representations, using the learned approximate semantics when concrete semantics are not known (in unfinished parts of the program). We show that these hybrid neuro-symbolic representations enable execution-guided synthesizers to use more powerful language constructs, such as loops and higher-order functions, and can be used to synthesize programs more accurately for a given search budget than pure neural approaches in several domains.

Viaarxiv icon

DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning

Jun 15, 2020
Kevin Ellis, Catherine Wong, Maxwell Nye, Mathias Sable-Meyer, Luc Cary, Lucas Morales, Luke Hewitt, Armando Solar-Lezama, Joshua B. Tenenbaum

Figure 1 for DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning
Figure 2 for DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning
Figure 3 for DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning
Figure 4 for DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning

Expert problem-solving is driven by powerful languages for thinking about problems and their solutions. Acquiring expertise means learning these languages -- systems of concepts, alongside the skills to use them. We present DreamCoder, a system that learns to solve problems by writing programs. It builds expertise by creating programming languages for expressing domain concepts, together with neural networks to guide the search for programs within these languages. A ``wake-sleep'' learning algorithm alternately extends the language with new symbolic abstractions and trains the neural network on imagined and replayed problems. DreamCoder solves both classic inductive programming tasks and creative tasks such as drawing pictures and building scenes. It rediscovers the basics of modern functional programming, vector algebra and classical physics, including Newton's and Coulomb's laws. Concepts are built compositionally from those learned earlier, yielding multi-layered symbolic representations that are interpretable and transferrable to new tasks, while still growing scalably and flexibly with experience.

Viaarxiv icon

Write, Execute, Assess: Program Synthesis with a REPL

Jun 09, 2019
Kevin Ellis, Maxwell Nye, Yewen Pu, Felix Sosa, Josh Tenenbaum, Armando Solar-Lezama

Figure 1 for Write, Execute, Assess: Program Synthesis with a REPL
Figure 2 for Write, Execute, Assess: Program Synthesis with a REPL
Figure 3 for Write, Execute, Assess: Program Synthesis with a REPL
Figure 4 for Write, Execute, Assess: Program Synthesis with a REPL

We present a neural program synthesis approach integrating components which write, execute, and assess code to navigate the search space of possible programs. We equip the search process with an interpreter or a read-eval-print-loop (REPL), which immediately executes partially written programs, exposing their semantics. The REPL addresses a basic challenge of program synthesis: tiny changes in syntax can lead to huge changes in semantics. We train a pair of models, a policy that proposes the new piece of code to write, and a value function that assesses the prospects of the code written so-far. At test time we can combine these models with a Sequential Monte Carlo algorithm. We apply our approach to two domains: synthesizing text editing programs and inferring 2D and 3D graphics programs.

* The first four authors contributed equally to this work 
Viaarxiv icon

Learning to Infer Program Sketches

Feb 17, 2019
Maxwell Nye, Luke Hewitt, Joshua Tenenbaum, Armando Solar-Lezama

Figure 1 for Learning to Infer Program Sketches
Figure 2 for Learning to Infer Program Sketches
Figure 3 for Learning to Infer Program Sketches
Figure 4 for Learning to Infer Program Sketches

Our goal is to build systems which write code automatically from the kinds of specifications humans can most easily provide, such as examples and natural language instruction. The key idea of this work is that a flexible combination of pattern recognition and explicit reasoning can be used to solve these complex programming problems. We propose a method for dynamically integrating these types of information. Our novel intermediate representation and training algorithm allow a program synthesis system to learn, without direct supervision, when to rely on pattern recognition and when to perform symbolic search. Our model matches the memorization and generalization performance of neural synthesis and symbolic search, respectively, and achieves state-of-the-art performance on a dataset of simple English description-to-code programming problems.

Viaarxiv icon