Alert button
Picture for Sebastian Farquhar

Sebastian Farquhar

Alert button

Model evaluation for extreme risks

May 24, 2023
Toby Shevlane, Sebastian Farquhar, Ben Garfinkel, Mary Phuong, Jess Whittlestone, Jade Leung, Daniel Kokotajlo, Nahema Marchal, Markus Anderljung, Noam Kolt, Lewis Ho, Divya Siddarth, Shahar Avin, Will Hawkins, Been Kim, Iason Gabriel, Vijay Bolina, Jack Clark, Yoshua Bengio, Paul Christiano, Allan Dafoe

Figure 1 for Model evaluation for extreme risks
Figure 2 for Model evaluation for extreme risks
Figure 3 for Model evaluation for extreme risks
Figure 4 for Model evaluation for extreme risks

Current approaches to building general-purpose AI systems tend to produce systems with both beneficial and harmful capabilities. Further progress in AI development could lead to capabilities that pose extreme risks, such as offensive cyber capabilities or strong manipulation skills. We explain why model evaluation is critical for addressing extreme risks. Developers must be able to identify dangerous capabilities (through "dangerous capability evaluations") and the propensity of models to apply their capabilities for harm (through "alignment evaluations"). These evaluations will become critical for keeping policymakers and other stakeholders informed, and for making responsible decisions about model training, deployment, and security.

Viaarxiv icon

Prediction-Oriented Bayesian Active Learning

Apr 17, 2023
Freddie Bickford Smith, Andreas Kirsch, Sebastian Farquhar, Yarin Gal, Adam Foster, Tom Rainforth

Figure 1 for Prediction-Oriented Bayesian Active Learning
Figure 2 for Prediction-Oriented Bayesian Active Learning
Figure 3 for Prediction-Oriented Bayesian Active Learning
Figure 4 for Prediction-Oriented Bayesian Active Learning

Information-theoretic approaches to active learning have traditionally focused on maximising the information gathered about the model parameters, most commonly by optimising the BALD score. We highlight that this can be suboptimal from the perspective of predictive performance. For example, BALD lacks a notion of an input distribution and so is prone to prioritise data of limited relevance. To address this we propose the expected predictive information gain (EPIG), an acquisition function that measures information gain in the space of predictions rather than parameters. We find that using EPIG leads to stronger predictive performance compared with BALD across a range of datasets and models, and thus provides an appealing drop-in replacement.

* Published at AISTATS 2023 
Viaarxiv icon

Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation

Feb 21, 2023
Lorenz Kuhn, Yarin Gal, Sebastian Farquhar

Figure 1 for Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation
Figure 2 for Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation
Figure 3 for Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation
Figure 4 for Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation

We introduce a method to measure uncertainty in large language models. For tasks like question answering, it is essential to know when we can trust the natural language outputs of foundation models. We show that measuring uncertainty in natural language is challenging because of "semantic equivalence" -- different sentences can mean the same thing. To overcome these challenges we introduce semantic entropy -- an entropy which incorporates linguistic invariances created by shared meanings. Our method is unsupervised, uses only a single model, and requires no modifications to off-the-shelf language models. In comprehensive ablation studies we show that the semantic entropy is more predictive of model accuracy on question answering data sets than comparable baselines.

Viaarxiv icon

CLAM: Selective Clarification for Ambiguous Questions with Large Language Models

Dec 15, 2022
Lorenz Kuhn, Yarin Gal, Sebastian Farquhar

Figure 1 for CLAM: Selective Clarification for Ambiguous Questions with Large Language Models
Figure 2 for CLAM: Selective Clarification for Ambiguous Questions with Large Language Models
Figure 3 for CLAM: Selective Clarification for Ambiguous Questions with Large Language Models
Figure 4 for CLAM: Selective Clarification for Ambiguous Questions with Large Language Models

State-of-the-art language models are often accurate on many question-answering benchmarks with well-defined questions. Yet, in real settings questions are often unanswerable without asking the user for clarifying information. We show that current SotA models often do not ask the user for clarification when presented with imprecise questions and instead provide incorrect answers or "hallucinate". To address this, we introduce CLAM, a framework that first uses the model to detect ambiguous questions, and if an ambiguous question is detected, prompts the model to ask the user for clarification. Furthermore, we show how to construct a scalable and cost-effective automatic evaluation protocol using an oracle language model with privileged information to provide clarifying information. We show that our method achieves a 20.15 percentage point accuracy improvement over SotA on a novel ambiguous question-answering answering data set derived from TriviaQA.

Viaarxiv icon

Do Bayesian Neural Networks Need To Be Fully Stochastic?

Nov 11, 2022
Mrinank Sharma, Sebastian Farquhar, Eric Nalisnick, Tom Rainforth

Figure 1 for Do Bayesian Neural Networks Need To Be Fully Stochastic?
Figure 2 for Do Bayesian Neural Networks Need To Be Fully Stochastic?
Figure 3 for Do Bayesian Neural Networks Need To Be Fully Stochastic?
Figure 4 for Do Bayesian Neural Networks Need To Be Fully Stochastic?

We investigate the efficacy of treating all the parameters in a Bayesian neural network stochastically and find compelling theoretical and empirical evidence that this standard construction may be unnecessary. To this end, we prove that expressive predictive distributions require only small amounts of stochasticity. In particular, partially stochastic networks with only $n$ stochastic biases are universal probabilistic predictors for $n$-dimensional predictive problems. In empirical investigations, we find no systematic benefit of full stochasticity across four different inference modalities and eight datasets; partially stochastic networks can match and sometimes even outperform fully stochastic networks, despite their reduced memory costs.

Viaarxiv icon

Understanding Approximation for Bayesian Inference in Neural Networks

Nov 11, 2022
Sebastian Farquhar

Figure 1 for Understanding Approximation for Bayesian Inference in Neural Networks
Figure 2 for Understanding Approximation for Bayesian Inference in Neural Networks
Figure 3 for Understanding Approximation for Bayesian Inference in Neural Networks
Figure 4 for Understanding Approximation for Bayesian Inference in Neural Networks

Bayesian inference has theoretical attractions as a principled framework for reasoning about beliefs. However, the motivations of Bayesian inference which claim it to be the only 'rational' kind of reasoning do not apply in practice. They create a binary split in which all approximate inference is equally 'irrational'. Instead, we should ask ourselves how to define a spectrum of more- and less-rational reasoning that explains why we might prefer one Bayesian approximation to another. I explore approximate inference in Bayesian neural networks and consider the unintended interactions between the probabilistic model, approximating distribution, optimization algorithm, and dataset. The complexity of these interactions highlights the difficulty of any strategy for evaluating Bayesian approximations which focuses entirely on the method, outside the context of specific datasets and decision-problems. For given applications, the expected utility of the approximate posterior can measure inference quality. To assess a model's ability to incorporate different parts of the Bayesian framework we can identify desirable characteristic behaviours of Bayesian reasoning and pick decision-problems that make heavy use of those behaviours. Here, we use continual learning (testing the ability to update sequentially) and active learning (testing the ability to represent credence). But existing continual and active learning set-ups pose challenges that have nothing to do with posterior quality which can distort their ability to evaluate Bayesian approximations. These unrelated challenges can be removed or reduced, allowing better evaluation of approximate inference methods.

* Accepted as a thesis satisfying the requirements of a D.Phil at the Universty of Oxford 
Viaarxiv icon

Discovering Agents

Aug 24, 2022
Zachary Kenton, Ramana Kumar, Sebastian Farquhar, Jonathan Richens, Matt MacDermott, Tom Everitt

Figure 1 for Discovering Agents
Figure 2 for Discovering Agents
Figure 3 for Discovering Agents
Figure 4 for Discovering Agents

Causal models of agents have been used to analyse the safety aspects of machine learning systems. But identifying agents is non-trivial -- often the causal model is just assumed by the modeler without much justification -- and modelling failures can lead to mistakes in the safety analysis. This paper proposes the first formal causal definition of agents -- roughly that agents are systems that would adapt their policy if their actions influenced the world in a different way. From this we derive the first causal discovery algorithm for discovering agents from empirical data, and give algorithms for translating between causal models and game-theoretic influence diagrams. We demonstrate our approach by resolving some previous confusions caused by incorrect causal modelling of agents.

* Some typos corrected 
Viaarxiv icon

Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt

Jun 16, 2022
Sören Mindermann, Jan Brauner, Muhammed Razzak, Mrinank Sharma, Andreas Kirsch, Winnie Xu, Benedikt Höltgen, Aidan N. Gomez, Adrien Morisot, Sebastian Farquhar, Yarin Gal

Figure 1 for Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
Figure 2 for Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
Figure 3 for Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
Figure 4 for Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt

Training on web-scale data can take months. But most computation and time is wasted on redundant and noisy points that are already learnt or not learnable. To accelerate training, we introduce Reducible Holdout Loss Selection (RHO-LOSS), a simple but principled technique which selects approximately those points for training that most reduce the model's generalization loss. As a result, RHO-LOSS mitigates the weaknesses of existing data selection methods: techniques from the optimization literature typically select 'hard' (e.g. high loss) points, but such points are often noisy (not learnable) or less task-relevant. Conversely, curriculum learning prioritizes 'easy' points, but such points need not be trained on once learned. In contrast, RHO-LOSS selects points that are learnable, worth learning, and not yet learnt. RHO-LOSS trains in far fewer steps than prior art, improves accuracy, and speeds up training on a wide range of datasets, hyperparameters, and architectures (MLPs, CNNs, and BERT). On the large web-scraped image dataset Clothing-1M, RHO-LOSS trains in 18x fewer steps and reaches 2% higher final accuracy than uniform data shuffling.

* ICML 2022 (Follow up to arXiv:2107.02565) 
Viaarxiv icon

Path-Specific Objectives for Safer Agent Incentives

Apr 21, 2022
Sebastian Farquhar, Ryan Carey, Tom Everitt

Figure 1 for Path-Specific Objectives for Safer Agent Incentives
Figure 2 for Path-Specific Objectives for Safer Agent Incentives
Figure 3 for Path-Specific Objectives for Safer Agent Incentives
Figure 4 for Path-Specific Objectives for Safer Agent Incentives

We present a general framework for training safe agents whose naive incentives are unsafe. As an example, manipulative or deceptive behaviour can improve rewards but should be avoided. Most approaches fail here: agents maximize expected return by any means necessary. We formally describe settings with 'delicate' parts of the state which should not be used as a means to an end. We then train agents to maximize the causal effect of actions on the expected return which is not mediated by the delicate parts of state, using Causal Influence Diagram analysis. The resulting agents have no incentive to control the delicate state. We further show how our framework unifies and generalizes existing proposals.

* Presented at AAAI 2022 
Viaarxiv icon

Prospect Pruning: Finding Trainable Weights at Initialization using Meta-Gradients

Feb 16, 2022
Milad Alizadeh, Shyam A. Tailor, Luisa M Zintgraf, Joost van Amersfoort, Sebastian Farquhar, Nicholas Donald Lane, Yarin Gal

Figure 1 for Prospect Pruning: Finding Trainable Weights at Initialization using Meta-Gradients
Figure 2 for Prospect Pruning: Finding Trainable Weights at Initialization using Meta-Gradients
Figure 3 for Prospect Pruning: Finding Trainable Weights at Initialization using Meta-Gradients
Figure 4 for Prospect Pruning: Finding Trainable Weights at Initialization using Meta-Gradients

Pruning neural networks at initialization would enable us to find sparse models that retain the accuracy of the original network while consuming fewer computational resources for training and inference. However, current methods are insufficient to enable this optimization and lead to a large degradation in model performance. In this paper, we identify a fundamental limitation in the formulation of current methods, namely that their saliency criteria look at a single step at the start of training without taking into account the trainability of the network. While pruning iteratively and gradually has been shown to improve pruning performance, explicit consideration of the training stage that will immediately follow pruning has so far been absent from the computation of the saliency criterion. To overcome the short-sightedness of existing methods, we propose Prospect Pruning (ProsPr), which uses meta-gradients through the first few steps of optimization to determine which weights to prune. ProsPr combines an estimate of the higher-order effects of pruning on the loss and the optimization trajectory to identify the trainable sub-network. Our method achieves state-of-the-art pruning performance on a variety of vision classification tasks, with less data and in a single shot compared to existing pruning-at-initialization methods.

Viaarxiv icon