Alert button
Picture for Nate Gruver

Nate Gruver

Alert button

Large Language Models Are Zero-Shot Time Series Forecasters

Oct 11, 2023
Nate Gruver, Marc Finzi, Shikai Qiu, Andrew Gordon Wilson

By encoding time series as a string of numerical digits, we can frame time series forecasting as next-token prediction in text. Developing this approach, we find that large language models (LLMs) such as GPT-3 and LLaMA-2 can surprisingly zero-shot extrapolate time series at a level comparable to or exceeding the performance of purpose-built time series models trained on the downstream tasks. To facilitate this performance, we propose procedures for effectively tokenizing time series data and converting discrete distributions over tokens into highly flexible densities over continuous values. We argue the success of LLMs for time series stems from their ability to naturally represent multimodal distributions, in conjunction with biases for simplicity, and repetition, which align with the salient features in many time series, such as repeated seasonal trends. We also show how LLMs can naturally handle missing data without imputation through non-numerical text, accommodate textual side information, and answer questions to help explain predictions. While we find that increasing model size generally improves performance on time series, we show GPT-4 can perform worse than GPT-3 because of how it tokenizes numbers, and poor uncertainty calibration, which is likely the result of alignment interventions such as RLHF.

* NeurIPS 2023. Code available at: https://github.com/ngruver/llmtime 
Viaarxiv icon

Protein Design with Guided Discrete Diffusion

May 31, 2023
Nate Gruver, Samuel Stanton, Nathan C. Frey, Tim G. J. Rudner, Isidro Hotzel, Julien Lafrance-Vanasse, Arvind Rajpal, Kyunghyun Cho, Andrew Gordon Wilson

Figure 1 for Protein Design with Guided Discrete Diffusion
Figure 2 for Protein Design with Guided Discrete Diffusion
Figure 3 for Protein Design with Guided Discrete Diffusion
Figure 4 for Protein Design with Guided Discrete Diffusion

A popular approach to protein design is to combine a generative model with a discriminative model for conditional sampling. The generative model samples plausible sequences while the discriminative model guides a search for sequences with high fitness. Given its broad success in conditional sampling, classifier-guided diffusion modeling is a promising foundation for protein design, leading many to develop guided diffusion models for structure with inverse folding to recover sequences. In this work, we propose diffusioN Optimized Sampling (NOS), a guidance method for discrete diffusion models that follows gradients in the hidden states of the denoising network. NOS makes it possible to perform design directly in sequence space, circumventing significant limitations of structure-based methods, including scarce data and challenging inverse design. Moreover, we use NOS to generalize LaMBO, a Bayesian optimization procedure for sequence design that facilitates multiple objectives and edit-based constraints. The resulting method, LaMBO-2, enables discrete diffusions and stronger performance with limited edits through a novel application of saliency maps. We apply LaMBO-2 to a real-world protein design task, optimizing antibodies for higher expression yield and binding affinity to a therapeutic target under locality and liability constraints, with 97% expression rate and 25% binding rate in exploratory in vitro experiments.

Viaarxiv icon

On Feature Learning in the Presence of Spurious Correlations

Oct 20, 2022
Pavel Izmailov, Polina Kirichenko, Nate Gruver, Andrew Gordon Wilson

Figure 1 for On Feature Learning in the Presence of Spurious Correlations
Figure 2 for On Feature Learning in the Presence of Spurious Correlations
Figure 3 for On Feature Learning in the Presence of Spurious Correlations
Figure 4 for On Feature Learning in the Presence of Spurious Correlations

Deep classifiers are known to rely on spurious features $\unicode{x2013}$ patterns which are correlated with the target on the training data but not inherently relevant to the learning problem, such as the image backgrounds when classifying the foregrounds. In this paper we evaluate the amount of information about the core (non-spurious) features that can be decoded from the representations learned by standard empirical risk minimization (ERM) and specialized group robustness training. Following recent work on Deep Feature Reweighting (DFR), we evaluate the feature representations by re-training the last layer of the model on a held-out set where the spurious correlation is broken. On multiple vision and NLP problems, we show that the features learned by simple ERM are highly competitive with the features learned by specialized group robustness methods targeted at reducing the effect of spurious correlations. Moreover, we show that the quality of learned feature representations is greatly affected by the design decisions beyond the training method, such as the model architecture and pre-training strategy. On the other hand, we find that strong regularization is not necessary for learning high quality feature representations. Finally, using insights from our analysis, we significantly improve upon the best results reported in the literature on the popular Waterbirds, CelebA hair color prediction and WILDS-FMOW problems, achieving 97%, 92% and 50% worst-group accuracies, respectively.

* NeurIPS 2022. Code available at https://github.com/izmailovpavel/spurious_feature_learning 
Viaarxiv icon

The Lie Derivative for Measuring Learned Equivariance

Oct 06, 2022
Nate Gruver, Marc Finzi, Micah Goldblum, Andrew Gordon Wilson

Figure 1 for The Lie Derivative for Measuring Learned Equivariance
Figure 2 for The Lie Derivative for Measuring Learned Equivariance
Figure 3 for The Lie Derivative for Measuring Learned Equivariance
Figure 4 for The Lie Derivative for Measuring Learned Equivariance

Equivariance guarantees that a model's predictions capture key symmetries in data. When an image is translated or rotated, an equivariant model's representation of that image will translate or rotate accordingly. The success of convolutional neural networks has historically been tied to translation equivariance directly encoded in their architecture. The rising success of vision transformers, which have no explicit architectural bias towards equivariance, challenges this narrative and suggests that augmentations and training data might also play a significant role in their performance. In order to better understand the role of equivariance in recent vision models, we introduce the Lie derivative, a method for measuring equivariance with strong mathematical foundations and minimal hyperparameters. Using the Lie derivative, we study the equivariance properties of hundreds of pretrained models, spanning CNNs, transformers, and Mixer architectures. The scale of our analysis allows us to separate the impact of architecture from other factors like model size or training method. Surprisingly, we find that many violations of equivariance can be linked to spatial aliasing in ubiquitous network layers, such as pointwise non-linearities, and that as models get larger and more accurate they tend to display more equivariance, regardless of architecture. For example, transformers can be more equivariant than convolutional neural networks after training.

Viaarxiv icon

Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders

Mar 23, 2022
Samuel Stanton, Wesley Maddox, Nate Gruver, Phillip Maffettone, Emily Delaney, Peyton Greenside, Andrew Gordon Wilson

Figure 1 for Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders
Figure 2 for Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders
Figure 3 for Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders
Figure 4 for Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders

Bayesian optimization is a gold standard for query-efficient continuous optimization. However, its adoption for drug and antibody sequence design has been hindered by the discrete, high-dimensional nature of the decision variables. We develop a new approach (LaMBO) which jointly trains a denoising autoencoder with a discriminative multi-task Gaussian process head, enabling gradient-based optimization of multi-objective acquisition functions in the latent space of the autoencoder. These acquisition functions allow LaMBO to balance the explore-exploit trade-off over multiple design rounds, and to balance objective tradeoffs by optimizing sequences at many different points on the Pareto frontier. We evaluate LaMBO on a small-molecule task based on the ZINC dataset and introduce a new large-molecule task targeting fluorescent proteins. In our experiments, LaMBO outperforms genetic optimizers and does not require a large pretraining corpus, demonstrating that Bayesian optimization is practical and effective for biological sequence design.

Viaarxiv icon

Deconstructing the Inductive Biases of Hamiltonian Neural Networks

Feb 12, 2022
Nate Gruver, Marc Finzi, Samuel Stanton, Andrew Gordon Wilson

Figure 1 for Deconstructing the Inductive Biases of Hamiltonian Neural Networks
Figure 2 for Deconstructing the Inductive Biases of Hamiltonian Neural Networks
Figure 3 for Deconstructing the Inductive Biases of Hamiltonian Neural Networks
Figure 4 for Deconstructing the Inductive Biases of Hamiltonian Neural Networks

Physics-inspired neural networks (NNs), such as Hamiltonian or Lagrangian NNs, dramatically outperform other learned dynamics models by leveraging strong inductive biases. These models, however, are challenging to apply to many real world systems, such as those that don't conserve energy or contain contacts, a common setting for robotics and reinforcement learning. In this paper, we examine the inductive biases that make physics-inspired models successful in practice. We show that, contrary to conventional wisdom, the improved generalization of HNNs is the result of modeling acceleration directly and avoiding artificial complexity from the coordinate system, rather than symplectic structure or energy conservation. We show that by relaxing the inductive biases of these models, we can match or exceed performance on energy-conserving systems while dramatically improving performance on practical, non-conservative systems. We extend this approach to constructing transition models for common Mujoco environments, showing that our model can appropriately balance inductive biases with the flexibility required for model-based control.

* ICLR 2022. Code available at https://github.com/ngruver/decon-hnn 
Viaarxiv icon

Adaptive Informative Path Planning with Multimodal Sensing

Mar 21, 2020
Shushman Choudhury, Nate Gruver, Mykel J. Kochenderfer

Figure 1 for Adaptive Informative Path Planning with Multimodal Sensing
Figure 2 for Adaptive Informative Path Planning with Multimodal Sensing
Figure 3 for Adaptive Informative Path Planning with Multimodal Sensing
Figure 4 for Adaptive Informative Path Planning with Multimodal Sensing

Adaptive Informative Path Planning (AIPP) problems model an agent tasked with obtaining information subject to resource constraints in unknown, partially observable environments. Existing work on AIPP has focused on representing observations about the world as a result of agent movement. We formulate the more general setting where the agent may choose between different sensors at the cost of some energy, in addition to traversing the environment to gather information. We call this problem AIPPMS (MS for Multimodal Sensing). AIPPMS requires reasoning jointly about the effects of sensing and movement in terms of both energy expended and information gained. We frame AIPPMS as a Partially Observable Markov Decision Process (POMDP) and solve it with online planning. Our approach is based on the Partially Observable Monte Carlo Planning framework with modifications to ensure constraint feasibility and a heuristic rollout policy tailored for AIPPMS. We evaluate our method on two domains: a simulated search-and-rescue scenario and a challenging extension to the classic RockSample problem. We find that our approach outperforms a classic AIPP algorithm that is modified for AIPPMS, as well as online planning using a random rollout policy.

* First two authors contributed equally; International Conference on Automated Planning and Scheduling (ICAPS) 2020 
Viaarxiv icon

Using Latent Variable Models to Observe Academic Pathways

May 31, 2019
Nate Gruver, Ali Malik, Brahm Capoor, Chris Piech, Mitchell L. Stevens, Andreas Paepcke

Figure 1 for Using Latent Variable Models to Observe Academic Pathways
Figure 2 for Using Latent Variable Models to Observe Academic Pathways
Figure 3 for Using Latent Variable Models to Observe Academic Pathways
Figure 4 for Using Latent Variable Models to Observe Academic Pathways

Understanding large-scale patterns in student course enrollment is a problem of great interest to university administrators and educational researchers. Yet important decisions are often made without a good quantitative framework of the process underlying student choices. We propose a probabilistic approach to modelling course enrollment decisions, drawing inspiration from multilabel classification and mixture models. We use ten years of anonymized student transcripts from a large university to construct a Gaussian latent variable model that learns the joint distribution over course enrollments. The models allow for a diverse set of inference queries and robustness to data sparsity. We demonstrate the efficacy of this approach in comparison to others, including deep learning architectures, and demonstrate its ability to infer the underlying student interests that guide enrollment decisions.

* Twelfth International Conference on Educational Data Mining 
Viaarxiv icon

Amanuensis: The Programmer's Apprentice

Jun 29, 2018
Thomas Dean, Maurice Chiang, Marcus Gomez, Nate Gruver, Yousef Hindy, Michelle Lam, Peter Lu, Sophia Sanchez, Rohun Saxena, Michael Smith, Lucy Wang, Catherine Wong

Figure 1 for Amanuensis: The Programmer's Apprentice
Figure 2 for Amanuensis: The Programmer's Apprentice
Figure 3 for Amanuensis: The Programmer's Apprentice
Figure 4 for Amanuensis: The Programmer's Apprentice

This document provides an overview of the material covered in a course taught at Stanford in the spring quarter of 2018. The course draws upon insight from cognitive and systems neuroscience to implement hybrid connectionist and symbolic reasoning systems that leverage and extend the state of the art in machine learning by integrating human and machine intelligence. As a concrete example we focus on digital assistants that learn from continuous dialog with an expert software engineer while providing initial value as powerful analytical, computational and mathematical savants. Over time these savants learn cognitive strategies (domain-relevant problem solving skills) and develop intuitions (heuristics and the experience necessary for applying them) by learning from their expert associates. By doing so these savants elevate their innate analytical skills allowing them to partner on an equal footing as versatile collaborators - effectively serving as cognitive extensions and digital prostheses, thereby amplifying and emulating their human partner's conceptually-flexible thinking patterns and enabling improved access to and control over powerful computing resources.

Viaarxiv icon