Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zico Kolter

Improving Alignment and Robustness with Short Circuiting

Jun 06, 2024

Andy Zou, Long Phan, Justin Wang, Derek Duenas, Maxwell Lin, Maksym Andriushchenko, Rowan Wang, Zico Kolter, Matt Fredrikson, Dan Hendrycks

Figure 1 for Improving Alignment and Robustness with Short Circuiting

Figure 2 for Improving Alignment and Robustness with Short Circuiting

Figure 3 for Improving Alignment and Robustness with Short Circuiting

Figure 4 for Improving Alignment and Robustness with Short Circuiting

Abstract:AI systems can take harmful actions and are highly vulnerable to adversarial attacks. We present an approach, inspired by recent advances in representation engineering, that "short-circuits" models as they respond with harmful outputs. Existing techniques aimed at improving alignment, such as refusal training, are often bypassed. Techniques such as adversarial training try to plug these holes by countering specific attacks. As an alternative to refusal training and adversarial training, short-circuiting directly controls the representations that are responsible for harmful outputs in the first place. Our technique can be applied to both text-only and multimodal language models to prevent the generation of harmful outputs without sacrificing utility -- even in the presence of powerful unseen attacks. Notably, while adversarial robustness in standalone image recognition remains an open challenge, short-circuiting allows the larger multimodal system to reliably withstand image "hijacks" that aim to produce harmful content. Finally, we extend our approach to AI agents, demonstrating considerable reductions in the rate of harmful actions when they are under attack. Our approach represents a significant step forward in the development of reliable safeguards to harmful behavior and adversarial attacks.

Via

Access Paper or Ask Questions

Neural Network Verification with Branch-and-Bound for General Nonlinearities

May 31, 2024

Zhouxing Shi, Qirui Jin, Zico Kolter, Suman Jana, Cho-Jui Hsieh, Huan Zhang

Figure 1 for Neural Network Verification with Branch-and-Bound for General Nonlinearities

Figure 2 for Neural Network Verification with Branch-and-Bound for General Nonlinearities

Figure 3 for Neural Network Verification with Branch-and-Bound for General Nonlinearities

Figure 4 for Neural Network Verification with Branch-and-Bound for General Nonlinearities

Abstract:Branch-and-bound (BaB) is among the most effective methods for neural network (NN) verification. However, existing works on BaB have mostly focused on NNs with piecewise linear activations, especially ReLU networks. In this paper, we develop a general framework, named GenBaB, to conduct BaB for general nonlinearities in general computational graphs based on linear bound propagation. To decide which neuron to branch, we design a new branching heuristic which leverages linear bounds as shortcuts to efficiently estimate the potential improvement after branching. To decide nontrivial branching points for general nonlinear functions, we propose to optimize branching points offline, which can be efficiently leveraged during verification with a lookup table. We demonstrate the effectiveness of our GenBaB on verifying a wide range of NNs, including networks with activation functions such as Sigmoid, Tanh, Sine and GeLU, as well as networks involving multi-dimensional nonlinear operations such as multiplications in LSTMs and Vision Transformers. Our framework also allows the verification of general nonlinear computation graphs and enables verification applications beyond simple neural networks, particularly for AC Optimal Power Flow (ACOPF). GenBaB is part of the latest $\alpha,\!\beta$-CROWN, the winner of the 4th International Verification of Neural Networks Competition (VNN-COMP 2023).

* Preprint

Via

Access Paper or Ask Questions

Why is SAM Robust to Label Noise?

May 06, 2024

Christina Baek, Zico Kolter, Aditi Raghunathan

Figure 1 for Why is SAM Robust to Label Noise?

Figure 2 for Why is SAM Robust to Label Noise?

Figure 3 for Why is SAM Robust to Label Noise?

Figure 4 for Why is SAM Robust to Label Noise?

Abstract:Sharpness-Aware Minimization (SAM) is most known for achieving state-of the-art performances on natural image and language tasks. However, its most pronounced improvements (of tens of percent) is rather in the presence of label noise. Understanding SAM's label noise robustness requires a departure from characterizing the robustness of minimas lying in "flatter" regions of the loss landscape. In particular, the peak performance under label noise occurs with early stopping, far before the loss converges. We decompose SAM's robustness into two effects: one induced by changes to the logit term and the other induced by changes to the network Jacobian. The first can be observed in linear logistic regression where SAM provably up-weights the gradient contribution from clean examples. Although this explicit up-weighting is also observable in neural networks, when we intervene and modify SAM to remove this effect, surprisingly, we see no visible degradation in performance. We infer that SAM's effect in deeper networks is instead explained entirely by the effect SAM has on the network Jacobian. We theoretically derive the implicit regularization induced by this Jacobian effect in two layer linear networks. Motivated by our analysis, we see that cheaper alternatives to SAM that explicitly induce these regularization effects largely recover the benefits in deep networks trained on real-world datasets.

Via

Access Paper or Ask Questions

Forcing Diffuse Distributions out of Language Models

Apr 16, 2024

Yiming Zhang, Avi Schwarzschild, Nicholas Carlini, Zico Kolter, Daphne Ippolito

Figure 1 for Forcing Diffuse Distributions out of Language Models

Figure 2 for Forcing Diffuse Distributions out of Language Models

Figure 3 for Forcing Diffuse Distributions out of Language Models

Figure 4 for Forcing Diffuse Distributions out of Language Models

Abstract:Despite being trained specifically to follow user instructions, today's language models perform poorly when instructed to produce random outputs. For example, when prompted to pick a number uniformly between one and ten Llama-2-13B-chat disproportionately favors the number five, and when tasked with picking a first name at random, Mistral-7B-Instruct chooses Avery 40 times more often than we would expect based on the U.S. population. When these language models are used for real-world tasks where diversity of outputs is crucial, such as language model assisted dataset construction, their inability to produce diffuse distributions over valid choices is a major hurdle. In this work, we propose a fine-tuning method that encourages language models to output distributions that are diffuse over valid outcomes. The methods we introduce generalize across a variety of tasks and distributions and make large language models practical for synthetic dataset generation with little human intervention.

Via

Access Paper or Ask Questions

Predicting the Performance of Foundation Models via Agreement-on-the-Line

Apr 02, 2024

Aman Mehra, Rahul Saxena, Taeyoun Kim, Christina Baek, Zico Kolter, Aditi Raghunathan

Figure 1 for Predicting the Performance of Foundation Models via Agreement-on-the-Line

Figure 2 for Predicting the Performance of Foundation Models via Agreement-on-the-Line

Figure 3 for Predicting the Performance of Foundation Models via Agreement-on-the-Line

Figure 4 for Predicting the Performance of Foundation Models via Agreement-on-the-Line

Abstract:Estimating the out-of-distribution performance in regimes where labels are scarce is critical to safely deploy foundation models. Recently, it was shown that ensembles of neural networks observe the phenomena ``agreement-on-the-line'', which can be leveraged to reliably predict OOD performance without labels. However, in contrast to classical neural networks that are trained on in-distribution data from scratch for numerous epochs, foundation models undergo minimal finetuning from heavily pretrained weights, which may reduce the ensemble diversity needed to observe agreement-on-the-line. In our work, we demonstrate that when lightly finetuning multiple runs from a $\textit{single}$ foundation model, the choice of randomness during training (linear head initialization, data ordering, and data subsetting) can lead to drastically different levels of agreement-on-the-line in the resulting ensemble. Surprisingly, only random head initialization is able to reliably induce agreement-on-the-line in finetuned foundation models across vision and language benchmarks. Second, we demonstrate that ensembles of $\textit{multiple}$ foundation models pretrained on different datasets but finetuned on the same task can also show agreement-on-the-line. In total, by careful construction of a diverse ensemble, we can utilize agreement-on-the-line-based methods to predict the OOD performance of foundation models with high precision.

Via

Access Paper or Ask Questions

DART: Implicit Doppler Tomography for Radar Novel View Synthesis

Mar 06, 2024

Tianshu Huang, John Miller, Akarsh Prabhakara, Tao Jin, Tarana Laroia, Zico Kolter, Anthony Rowe

Figure 1 for DART: Implicit Doppler Tomography for Radar Novel View Synthesis

Figure 2 for DART: Implicit Doppler Tomography for Radar Novel View Synthesis

Figure 3 for DART: Implicit Doppler Tomography for Radar Novel View Synthesis

Figure 4 for DART: Implicit Doppler Tomography for Radar Novel View Synthesis

Abstract:Simulation is an invaluable tool for radio-frequency system designers that enables rapid prototyping of various algorithms for imaging, target detection, classification, and tracking. However, simulating realistic radar scans is a challenging task that requires an accurate model of the scene, radio frequency material properties, and a corresponding radar synthesis function. Rather than specifying these models explicitly, we propose DART - Doppler Aided Radar Tomography, a Neural Radiance Field-inspired method which uses radar-specific physics to create a reflectance and transmittance-based rendering pipeline for range-Doppler images. We then evaluate DART by constructing a custom data collection platform and collecting a novel radar dataset together with accurate position and instantaneous velocity measurements from lidar-based localization. In comparison to state-of-the-art baselines, DART synthesizes superior radar range-Doppler images from novel views across all datasets and additionally can be used to generate high quality tomographic images.

* To appear in CVPR 2024; see https://wiselabcmu.github.io/dart/ for our project site

Via

Access Paper or Ask Questions

Generative Posterior Networks for Approximately Bayesian Epistemic Uncertainty Estimation

Dec 29, 2023

Melrose Roderick, Felix Berkenkamp, Fatemeh Sheikholeslami, Zico Kolter

Figure 1 for Generative Posterior Networks for Approximately Bayesian Epistemic Uncertainty Estimation

Figure 2 for Generative Posterior Networks for Approximately Bayesian Epistemic Uncertainty Estimation

Figure 3 for Generative Posterior Networks for Approximately Bayesian Epistemic Uncertainty Estimation

Figure 4 for Generative Posterior Networks for Approximately Bayesian Epistemic Uncertainty Estimation

Abstract:In many real-world problems, there is a limited set of training data, but an abundance of unlabeled data. We propose a new method, Generative Posterior Networks (GPNs), that uses unlabeled data to estimate epistemic uncertainty in high-dimensional problems. A GPN is a generative model that, given a prior distribution over functions, approximates the posterior distribution directly by regularizing the network towards samples from the prior. We prove theoretically that our method indeed approximates the Bayesian posterior and show empirically that it improves epistemic uncertainty estimation and scalability over competing methods.

* 10 pages, 3 figures, 2 tables

Via

Access Paper or Ask Questions

Reliable Test-Time Adaptation via Agreement-on-the-Line

Oct 07, 2023

Eungyeup Kim, Mingjie Sun, Aditi Raghunathan, Zico Kolter

Figure 1 for Reliable Test-Time Adaptation via Agreement-on-the-Line

Figure 2 for Reliable Test-Time Adaptation via Agreement-on-the-Line

Figure 3 for Reliable Test-Time Adaptation via Agreement-on-the-Line

Figure 4 for Reliable Test-Time Adaptation via Agreement-on-the-Line

Abstract:Test-time adaptation (TTA) methods aim to improve robustness to distribution shifts by adapting models using unlabeled data from the shifted test distribution. However, there remain unresolved challenges that undermine the reliability of TTA, which include difficulties in evaluating TTA performance, miscalibration after TTA, and unreliable hyperparameter tuning for adaptation. In this work, we make a notable and surprising observation that TTAed models strongly show the agreement-on-the-line phenomenon (Baek et al., 2022) across a wide range of distribution shifts. We find such linear trends occur consistently in a wide range of models adapted with various hyperparameters, and persist in distributions where the phenomenon fails to hold in vanilla models (i.e., before adaptation). We leverage these observations to make TTA methods more reliable in three perspectives: (i) estimating OOD accuracy (without labeled data) to determine when TTA helps and when it hurts, (ii) calibrating TTAed models without label information, and (iii) reliably determining hyperparameters for TTA without any labeled validation data. Through extensive experiments, we demonstrate that various TTA methods can be precisely evaluated, both in terms of their improvements and degradations. Moreover, our proposed methods on unsupervised calibration and hyperparameters tuning for TTA achieve results close to the ones assuming access to ground-truth labels, in terms of both OOD accuracy and calibration error.

* 19 pages, 9 figures

Via

Access Paper or Ask Questions

Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation

Jun 01, 2023

Runtian Zhai, Bingbin Liu, Andrej Risteski, Zico Kolter, Pradeep Ravikumar

Abstract:Good data augmentation is one of the key factors that lead to the empirical success of self-supervised representation learning such as contrastive learning and masked language modeling, yet theoretical understanding of its role in learning good representations remains limited. Recent work has built the connection between self-supervised learning and approximating the top eigenspace of a graph Laplacian operator. Learning a linear probe on top of such features can naturally be connected to RKHS regression. In this work, we use this insight to perform a statistical analysis of augmentation-based pretraining. We start from the isometry property, a key geometric characterization of the target function given by the augmentation. Our first main theorem provides, for an arbitrary encoder, near tight bounds for both the estimation error incurred by fitting the linear probe on top of the encoder, and the approximation error entailed by the fitness of the RKHS the encoder learns. Our second main theorem specifically addresses the case where the encoder extracts the top-d eigenspace of a Monte-Carlo approximation of the underlying kernel with the finite pretraining samples. Our analysis completely disentangles the effects of the model and the augmentation. A key ingredient in our analysis is the augmentation complexity, which we use to quantitatively compare different augmentations and analyze their impact on downstream performance on synthetic and real datasets.

* 33 pages

Via

Access Paper or Ask Questions

Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single

Apr 21, 2023

Paul Vicol, Zico Kolter, Kevin Swersky

Figure 1 for Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single

Figure 2 for Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single

Figure 3 for Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single

Figure 4 for Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single

Abstract:We propose an evolution strategies-based algorithm for estimating gradients in unrolled computation graphs, called ES-Single. Similarly to the recently-proposed Persistent Evolution Strategies (PES), ES-Single is unbiased, and overcomes chaos arising from recursive function applications by smoothing the meta-loss landscape. ES-Single samples a single perturbation per particle, that is kept fixed over the course of an inner problem (e.g., perturbations are not re-sampled for each partial unroll). Compared to PES, ES-Single is simpler to implement and has lower variance: the variance of ES-Single is constant with respect to the number of truncated unrolls, removing a key barrier in applying ES to long inner problems using short truncations. We show that ES-Single is unbiased for quadratic inner problems, and demonstrate empirically that its variance can be substantially lower than that of PES. ES-Single consistently outperforms PES on a variety of tasks, including a synthetic benchmark task, hyperparameter optimization, training recurrent neural networks, and training learned optimizers.

Via

Access Paper or Ask Questions