Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andreas Kirsch

Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data

Nov 03, 2021
Andrew Jesson, Panagiotis Tigas, Joost van Amersfoort, Andreas Kirsch, Uri Shalit, Yarin Gal

Figure 1 for Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data

Figure 2 for Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data

Figure 3 for Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data

Figure 4 for Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data

Estimating personalized treatment effects from high-dimensional observational data is essential in situations where experimental designs are infeasible, unethical, or expensive. Existing approaches rely on fitting deep models on outcomes observed for treated and control populations. However, when measuring individual outcomes is costly, as is the case of a tumor biopsy, a sample-efficient strategy for acquiring each result is required. Deep Bayesian active learning provides a framework for efficient data acquisition by selecting points with high uncertainty. However, existing methods bias training data acquisition towards regions of non-overlapping support between the treated and control populations. These are not sample-efficient because the treatment effect is not identifiable in such regions. We introduce causal, Bayesian acquisition functions grounded in information theory that bias data acquisition towards regions with overlapping support to maximize sample efficiency for learning personalized treatment effects. We demonstrate the performance of the proposed acquisition strategies on synthetic and semi-synthetic datasets IHDP and CMNIST and their extensions, which aim to simulate common dataset biases and pathologies.

* 24 pages, 8 Figures, 5 tables, NeurIPS 2021

Via

Access Paper or Ask Questions

Prioritized training on points that are learnable, worth learning, and not yet learned

Jul 06, 2021
Sören Mindermann, Muhammed Razzak, Winnie Xu, Andreas Kirsch, Mrinank Sharma, Adrien Morisot, Aidan N. Gomez, Sebastian Farquhar, Jan Brauner, Yarin Gal

Figure 1 for Prioritized training on points that are learnable, worth learning, and not yet learned

Figure 2 for Prioritized training on points that are learnable, worth learning, and not yet learned

Figure 3 for Prioritized training on points that are learnable, worth learning, and not yet learned

Figure 4 for Prioritized training on points that are learnable, worth learning, and not yet learned

We introduce Goldilocks Selection, a technique for faster model training which selects a sequence of training points that are "just right". We propose an information-theoretic acquisition function -- the reducible validation loss -- and compute it with a small proxy model -- GoldiProx -- to efficiently choose training points that maximize information about a validation set. We show that the "hard" (e.g. high loss) points usually selected in the optimization literature are typically noisy, while the "easy" (e.g. low noise) samples often prioritized for curriculum learning confer less information. Further, points with uncertain labels, typically targeted by active learning, tend to be less relevant to the task. In contrast, Goldilocks Selection chooses points that are "just right" and empirically outperforms the above approaches. Moreover, the selected sequence can transfer to other architectures; practitioners can share and reuse it without the need to recreate it.

* ICML 2021 Workshop on Subset Selection in Machine Learning

Via

Access Paper or Ask Questions

A Practical & Unified Notation for Information-Theoretic Quantities in ML

Jun 22, 2021
Andreas Kirsch, Yarin Gal

Figure 1 for A Practical & Unified Notation for Information-Theoretic Quantities in ML

Figure 2 for A Practical & Unified Notation for Information-Theoretic Quantities in ML

Figure 3 for A Practical & Unified Notation for Information-Theoretic Quantities in ML

Figure 4 for A Practical & Unified Notation for Information-Theoretic Quantities in ML

Information theory is of importance to machine learning, but the notation for information-theoretic quantities is sometimes opaque. The right notation can convey valuable intuitions and concisely express new ideas. We propose such a notation for machine learning users and expand it to include information-theoretic quantities between events (outcomes) and random variables. We apply this notation to a popular information-theoretic acquisition function in Bayesian active learning which selects the most informative (unlabelled) samples to be labelled by an expert. We demonstrate the value of our notation when extending the acquisition function to the core-set problem, which consists of selecting the most informative samples \emph{given} the labels.

Via

Access Paper or Ask Questions

A Simple Baseline for Batch Active Learning with Stochastic Acquisition Functions

Jun 22, 2021
Andreas Kirsch, Sebastian Farquhar, Yarin Gal

Figure 1 for A Simple Baseline for Batch Active Learning with Stochastic Acquisition Functions

Figure 2 for A Simple Baseline for Batch Active Learning with Stochastic Acquisition Functions

Figure 3 for A Simple Baseline for Batch Active Learning with Stochastic Acquisition Functions

In active learning, new labels are commonly acquired in batches. However, common acquisition functions are only meant for one-sample acquisition rounds at a time, and when their scores are used naively for batch acquisition, they result in batches lacking diversity, which deteriorates performance. On the other hand, state-of-the-art batch acquisition functions are costly to compute. In this paper, we present a novel class of stochastic acquisition functions that extend one-sample acquisition functions to the batch setting by observing how one-sample acquisition scores change as additional samples are acquired and modelling this difference for additional batch samples. We simply acquire new samples by sampling from the pool set using a Gibbs distribution based on the acquisition scores. Our acquisition functions are both vastly cheaper to compute and out-perform other batch acquisition functions.

Via

Access Paper or Ask Questions

Active Learning under Pool Set Distribution Shift and Noisy Data

Jun 22, 2021
Andreas Kirsch, Tom Rainforth, Yarin Gal

Figure 1 for Active Learning under Pool Set Distribution Shift and Noisy Data

Figure 2 for Active Learning under Pool Set Distribution Shift and Noisy Data

Figure 3 for Active Learning under Pool Set Distribution Shift and Noisy Data

Figure 4 for Active Learning under Pool Set Distribution Shift and Noisy Data

Active Learning is essential for more label-efficient deep learning. Bayesian Active Learning has focused on BALD, which reduces model parameter uncertainty. However, we show that BALD gets stuck on out-of-distribution or junk data that is not relevant for the task. We examine a novel *Expected Predictive Information Gain (EPIG)* to deal with distribution shifts of the pool set. EPIG reduces the uncertainty of *predictions* on an unlabelled *evaluation set* sampled from the test data distribution whose distribution might be different to the pool set distribution. Based on this, our new EPIG-BALD acquisition function for Bayesian Neural Networks selects samples to improve the performance on the test data distribution instead of selecting samples that reduce model uncertainty everywhere, including for out-of-distribution regions with low density in the test data distribution. Our method outperforms state-of-the-art Bayesian active learning methods on high-dimensional datasets and avoids out-of-distribution junk data in cases where current state-of-the-art methods fail.

Via

Access Paper or Ask Questions

Deterministic Neural Networks with Appropriate Inductive Biases Capture Epistemic and Aleatoric Uncertainty

Feb 23, 2021
Jishnu Mukhoti, Andreas Kirsch, Joost van Amersfoort, Philip H. S. Torr, Yarin Gal

Figure 1 for Deterministic Neural Networks with Appropriate Inductive Biases Capture Epistemic and Aleatoric Uncertainty

Figure 2 for Deterministic Neural Networks with Appropriate Inductive Biases Capture Epistemic and Aleatoric Uncertainty

Figure 3 for Deterministic Neural Networks with Appropriate Inductive Biases Capture Epistemic and Aleatoric Uncertainty

Figure 4 for Deterministic Neural Networks with Appropriate Inductive Biases Capture Epistemic and Aleatoric Uncertainty

We show that a single softmax neural net with minimal changes can beat the uncertainty predictions of Deep Ensembles and other more complex single-forward-pass uncertainty approaches. Softmax neural nets cannot capture epistemic uncertainty reliably because for OoD points they extrapolate arbitrarily and suffer from feature collapse. This results in arbitrary softmax entropies for OoD points which can have high entropy, low, or anything in between. We study why, and show that with the right inductive biases, softmax neural nets trained with maximum likelihood reliably capture epistemic uncertainty through the feature-space density. This density is obtained using Gaussian Discriminant Analysis, but it cannot disentangle uncertainties. We show that it is necessary to combine this density with the softmax entropy to disentangle aleatoric and epistemic uncertainty -- crucial e.g. for active learning. We examine the quality of epistemic uncertainty on active learning and OoD detection, where we obtain SOTA ~0.98 AUROC on CIFAR-10 vs SVHN.

Via

Access Paper or Ask Questions

PowerEvaluationBALD: Efficient Evaluation-Oriented Deep (Bayesian) Active Learning with Stochastic Acquisition Functions

Jan 10, 2021
Andreas Kirsch

Figure 1 for PowerEvaluationBALD: Efficient Evaluation-Oriented Deep (Bayesian) Active Learning with Stochastic Acquisition Functions

Figure 2 for PowerEvaluationBALD: Efficient Evaluation-Oriented Deep (Bayesian) Active Learning with Stochastic Acquisition Functions

Figure 3 for PowerEvaluationBALD: Efficient Evaluation-Oriented Deep (Bayesian) Active Learning with Stochastic Acquisition Functions

Figure 4 for PowerEvaluationBALD: Efficient Evaluation-Oriented Deep (Bayesian) Active Learning with Stochastic Acquisition Functions

We develop BatchEvaluationBALD, a new acquisition function for deep Bayesian active learning, as an expansion of BatchBALD that takes into account an evaluation set of unlabeled data, for example, the pool set. We also develop a variant for the non-Bayesian setting, which we call Evaluation Information Gain. To reduce computational requirements and allow these methods to scale to larger acquisition batch sizes, we introduce stochastic acquisition functions that use importance-sampling of tempered acquisition scores. We call this method PowerEvaluationBALD. We show in first experiments that PowerEvaluationBALD works on par with BatchEvaluationBALD, which outperforms BatchBALD on Repeated MNIST (MNISTx2), while massively reducing the computational requirements compared to BatchBALD or BatchEvaluationBALD.

Via

Access Paper or Ask Questions

Unpacking Information Bottlenecks: Unifying Information-Theoretic Objectives in Deep Learning

Apr 09, 2020
Andreas Kirsch, Clare Lyle, Yarin Gal

Figure 1 for Unpacking Information Bottlenecks: Unifying Information-Theoretic Objectives in Deep Learning

Figure 2 for Unpacking Information Bottlenecks: Unifying Information-Theoretic Objectives in Deep Learning

Figure 3 for Unpacking Information Bottlenecks: Unifying Information-Theoretic Objectives in Deep Learning

Figure 4 for Unpacking Information Bottlenecks: Unifying Information-Theoretic Objectives in Deep Learning

The information bottleneck (IB) principle offers both a mechanism to explain how deep neural networks train and generalize, as well as a regularized objective with which to train models. However, multiple competing objectives have been proposed based on this principle, and the information-theoretic quantities in these objectives are difficult to compute for large deep neural networks. This, in turn, limits their use as a training objective. In this work, we review these quantities, compare and unify previously proposed objectives and relate them to surrogate objectives more friendly to optimization. We find that these surrogate objectives allow us to apply the information bottleneck to modern neural network architectures. We demonstrate our insights on Permutation-MNIST, MNIST and CIFAR10.

Via

Access Paper or Ask Questions

BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning

Jun 19, 2019
Andreas Kirsch, Joost van Amersfoort, Yarin Gal

Figure 1 for BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning

Figure 2 for BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning

Figure 3 for BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning

Figure 4 for BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning

We develop BatchBALD, a tractable approximation to the mutual information between a batch of points and model parameters, which we use as an acquisition function to select multiple informative points jointly for the task of deep Bayesian active learning. BatchBALD is a greedy linear-time $1 - \frac{1}{e}$-approximate algorithm amenable to dynamic programming and efficient caching. We compare BatchBALD to the commonly used approach for batch data acquisition and find that the current approach acquires similar and redundant points, sometimes performing worse than randomly acquiring data. We finish by showing that, using BatchBALD to consider dependencies within an acquisition batch, we achieve new state of the art performance on standard benchmarks, providing substantial data efficiency improvements in batch acquisition.

Via

Access Paper or Ask Questions