Alert button
Picture for Yee Whye Teh

Yee Whye Teh

Alert button

SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning

Aug 02, 2023
Ning Miao, Yee Whye Teh, Tom Rainforth

Figure 1 for SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning
Figure 2 for SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning
Figure 3 for SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning
Figure 4 for SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning

The recent progress in large language models (LLMs), especially the invention of chain-of-thoughts (CoT) prompting, makes it possible to solve reasoning problems. However, even the strongest LLMs are still struggling with more complicated problems that require non-linear thinking and multi-step reasoning. In this work, we explore whether LLMs have the ability to recognize their own errors, without resorting to external resources. In particular, we investigate whether they can be used to identify individual errors within a step-by-step reasoning. To this end, we propose a zero-shot verification scheme to recognize such errors. We then use this verification scheme to improve question-answering performance, by using it to perform weighted voting on different generated answers. We test the method on three math datasets-GSM8K, MathQA, and MATH-and find that it successfully recognizes errors and, in turn, increases final predictive performance.

Viaarxiv icon

Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions

Jul 14, 2023
Leo Klarner, Tim G. J. Rudner, Michael Reutlinger, Torsten Schindler, Garrett M. Morris, Charlotte Deane, Yee Whye Teh

Figure 1 for Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions
Figure 2 for Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions
Figure 3 for Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions
Figure 4 for Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions

Accelerating the discovery of novel and more effective therapeutics is an important pharmaceutical problem in which deep learning is playing an increasingly significant role. However, real-world drug discovery tasks are often characterized by a scarcity of labeled data and significant covariate shift$\unicode{x2013}\unicode{x2013}$a setting that poses a challenge to standard deep learning methods. In this paper, we present Q-SAVI, a probabilistic model able to address these challenges by encoding explicit prior knowledge of the data-generating process into a prior distribution over functions, presenting researchers with a transparent and probabilistically principled way to encode data-driven modeling preferences. Building on a novel, gold-standard bioactivity dataset that facilitates a meaningful comparison of models in an extrapolative regime, we explore different approaches to induce data shift and construct a challenging evaluation setup. We then demonstrate that using Q-SAVI to integrate contextualized prior knowledge of drug-like chemical space into the modeling process affords substantial gains in predictive accuracy and calibration, outperforming a broad range of state-of-the-art self-supervised pre-training and domain adaptation techniques.

* Published in the Proceedings of the 40th International Conference on Machine Learning (ICML 2023) 
Viaarxiv icon

Geometric Neural Diffusion Processes

Jul 11, 2023
Emile Mathieu, Vincent Dutordoir, Michael J. Hutchinson, Valentin De Bortoli, Yee Whye Teh, Richard E. Turner

Figure 1 for Geometric Neural Diffusion Processes
Figure 2 for Geometric Neural Diffusion Processes
Figure 3 for Geometric Neural Diffusion Processes
Figure 4 for Geometric Neural Diffusion Processes

Denoising diffusion models have proven to be a flexible and effective paradigm for generative modelling. Their recent extension to infinite dimensional Euclidean spaces has allowed for the modelling of stochastic processes. However, many problems in the natural sciences incorporate symmetries and involve data living in non-Euclidean spaces. In this work, we extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling. We do so by a) constructing a noising process which admits, as limiting distribution, a geometric Gaussian process that transforms under the symmetry group of interest, and b) approximating the score with a neural network that is equivariant w.r.t. this group. We show that with these conditions, the generative functional model admits the same symmetry. We demonstrate scalability and capacity of the model, using a novel Langevin-based conditional sampler, to fit complex scalar and vector fields, with Euclidean and spherical codomain, on synthetic and real-world weather data.

Viaarxiv icon

Kalman Filter for Online Classification of Non-Stationary Data

Jun 14, 2023
Michalis K. Titsias, Alexandre Galashov, Amal Rannen-Triki, Razvan Pascanu, Yee Whye Teh, Jorg Bornschein

Figure 1 for Kalman Filter for Online Classification of Non-Stationary Data
Figure 2 for Kalman Filter for Online Classification of Non-Stationary Data
Figure 3 for Kalman Filter for Online Classification of Non-Stationary Data
Figure 4 for Kalman Filter for Online Classification of Non-Stationary Data

In Online Continual Learning (OCL) a learning system receives a stream of data and sequentially performs prediction and training steps. Important challenges in OCL are concerned with automatic adaptation to the particular non-stationary structure of the data, and with quantification of predictive uncertainty. Motivated by these challenges we introduce a probabilistic Bayesian online learning model by using a (possibly pretrained) neural representation and a state space model over the linear predictor weights. Non-stationarity over the linear predictor weights is modelled using a parameter drift transition density, parametrized by a coefficient that quantifies forgetting. Inference in the model is implemented with efficient Kalman filter recursions which track the posterior distribution over the linear weights, while online SGD updates over the transition dynamics coefficient allows to adapt to the non-stationarity seen in data. While the framework is developed assuming a linear Gaussian model, we also extend it to deal with classification problems and for fine-tuning the deep learning representation. In a set of experiments in multi-class classification using data sets such as CIFAR-100 and CLOC we demonstrate the predictive ability of the model and its flexibility to capture non-stationarity.

Viaarxiv icon

Deep Stochastic Processes via Functional Markov Transition Operators

May 24, 2023
Jin Xu, Emilien Dupont, Kaspar Märtens, Tom Rainforth, Yee Whye Teh

Figure 1 for Deep Stochastic Processes via Functional Markov Transition Operators
Figure 2 for Deep Stochastic Processes via Functional Markov Transition Operators
Figure 3 for Deep Stochastic Processes via Functional Markov Transition Operators
Figure 4 for Deep Stochastic Processes via Functional Markov Transition Operators

We introduce Markov Neural Processes (MNPs), a new class of Stochastic Processes (SPs) which are constructed by stacking sequences of neural parameterised Markov transition operators in function space. We prove that these Markov transition operators can preserve the exchangeability and consistency of SPs. Therefore, the proposed iterative construction adds substantial flexibility and expressivity to the original framework of Neural Processes (NPs) without compromising consistency or adding restrictions. Our experiments demonstrate clear advantages of MNPs over baseline models on a variety of tasks.

* 18 pages, 5 figures 
Viaarxiv icon

Incorporating Unlabelled Data into Bayesian Neural Networks

Apr 04, 2023
Mrinank Sharma, Tom Rainforth, Yee Whye Teh, Vincent Fortuin

Figure 1 for Incorporating Unlabelled Data into Bayesian Neural Networks
Figure 2 for Incorporating Unlabelled Data into Bayesian Neural Networks
Figure 3 for Incorporating Unlabelled Data into Bayesian Neural Networks
Figure 4 for Incorporating Unlabelled Data into Bayesian Neural Networks

We develop a contrastive framework for learning better prior distributions for Bayesian Neural Networks (BNNs) using unlabelled data. With this framework, we propose a practical BNN algorithm that offers the label-efficiency of self-supervised learning and the principled uncertainty estimates of Bayesian methods. Finally, we demonstrate the advantages of our approach for data-efficient learning in semi-supervised and low-budget active learning problems.

Viaarxiv icon

Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation

Feb 20, 2023
Bobby He, James Martens, Guodong Zhang, Aleksandar Botev, Andrew Brock, Samuel L Smith, Yee Whye Teh

Figure 1 for Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation
Figure 2 for Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation
Figure 3 for Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation
Figure 4 for Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation

Skip connections and normalisation layers form two standard architectural components that are ubiquitous for the training of Deep Neural Networks (DNNs), but whose precise roles are poorly understood. Recent approaches such as Deep Kernel Shaping have made progress towards reducing our reliance on them, using insights from wide NN kernel theory to improve signal propagation in vanilla DNNs (which we define as networks without skips or normalisation). However, these approaches are incompatible with the self-attention layers present in transformers, whose kernels are intrinsically more complicated to analyse and control. And so the question remains: is it possible to train deep vanilla transformers? We answer this question in the affirmative by designing several approaches that use combinations of parameter initialisations, bias matrices and location-dependent rescaling to achieve faithful signal propagation in vanilla transformers. Our methods address various intricacies specific to signal propagation in transformers, including the interaction with positional encoding and causal masking. In experiments on WikiText-103 and C4, our approaches enable deep transformers without normalisation to train at speeds matching their standard counterparts, and deep vanilla transformers to reach the same performance as standard ones after about 5 times more iterations.

* ICLR 2023 
Viaarxiv icon

Modality-Agnostic Variational Compression of Implicit Neural Representations

Jan 23, 2023
Jonathan Richard Schwarz, Jihoon Tack, Yee Whye Teh, Jaeho Lee, Jinwoo Shin

Figure 1 for Modality-Agnostic Variational Compression of Implicit Neural Representations
Figure 2 for Modality-Agnostic Variational Compression of Implicit Neural Representations
Figure 3 for Modality-Agnostic Variational Compression of Implicit Neural Representations
Figure 4 for Modality-Agnostic Variational Compression of Implicit Neural Representations

We introduce a modality-agnostic neural data compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR). Bridging the gap between latent coding and sparsity, we obtain compact latent representations which are non-linearly mapped to a soft gating mechanism capable of specialising a shared INR base network to each data item through subnetwork selection. After obtaining a dataset of such compact latent representations, we directly optimise the rate/distortion trade-off in this modality-agnostic space using non-linear transform coding. We term this method Variational Compression of Implicit Neural Representation (VC-INR) and show both improved performance given the same representational capacity pre quantisation while also outperforming previous quantisation schemes used for other INR-based techniques. Our experiments demonstrate strong results over a large set of diverse data modalities using the same algorithm without any modality-specific inductive biases. We show results on images, climate data, 3D shapes and scenes as well as audio and video, introducing VC-INR as the first INR-based method to outperform codecs as well-known and diverse as JPEG 2000, MP3 and AVC/HEVC on their respective modalities.

Viaarxiv icon

On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

Dec 28, 2022
Tim G. J. Rudner, Cong Lu, Michael A. Osborne, Yarin Gal, Yee Whye Teh

Figure 1 for On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations
Figure 2 for On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations
Figure 3 for On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations
Figure 4 for On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

KL-regularized reinforcement learning from expert demonstrations has proved successful in improving the sample efficiency of deep reinforcement learning algorithms, allowing them to be applied to challenging physical real-world tasks. However, we show that KL-regularized reinforcement learning with behavioral reference policies derived from expert demonstrations can suffer from pathological training dynamics that can lead to slow, unstable, and suboptimal online learning. We show empirically that the pathology occurs for commonly chosen behavioral policy classes and demonstrate its impact on sample efficiency and online policy performance. Finally, we show that the pathology can be remedied by non-parametric behavioral reference policies and that this allows KL-regularized reinforcement learning to significantly outperform state-of-the-art approaches on a variety of challenging locomotion and dexterous hand manipulation tasks.

* Published in Advances in Neural Information Processing Systems 34 (NeurIPS 2021) 
Viaarxiv icon

NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research

Nov 15, 2022
Jorg Bornschein, Alexandre Galashov, Ross Hemsley, Amal Rannen-Triki, Yutian Chen, Arslan Chaudhry, Xu Owen He, Arthur Douillard, Massimo Caccia, Qixuang Feng, Jiajun Shen, Sylvestre-Alvise Rebuffi, Kitty Stacpoole, Diego de las Casas, Will Hawkins, Angeliki Lazaridou, Yee Whye Teh, Andrei A. Rusu, Razvan Pascanu, Marc'Aurelio Ranzato

Figure 1 for NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research
Figure 2 for NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research
Figure 3 for NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research
Figure 4 for NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research

We introduce the Never Ending VIsual-classification Stream (NEVIS'22), a benchmark consisting of a stream of over 100 visual classification tasks, sorted chronologically and extracted from papers sampled uniformly from computer vision proceedings spanning the last three decades. The resulting stream reflects what the research community thought was meaningful at any point in time. Despite being limited to classification, the resulting stream has a rich diversity of tasks from OCR, to texture analysis, crowd counting, scene recognition, and so forth. The diversity is also reflected in the wide range of dataset sizes, spanning over four orders of magnitude. Overall, NEVIS'22 poses an unprecedented challenge for current sequential learning approaches due to the scale and diversity of tasks, yet with a low entry barrier as it is limited to a single modality and each task is a classical supervised learning problem. Moreover, we provide a reference implementation including strong baselines and a simple evaluation protocol to compare methods in terms of their trade-off between accuracy and compute. We hope that NEVIS'22 can be useful to researchers working on continual learning, meta-learning, AutoML and more generally sequential learning, and help these communities join forces towards more robust and efficient models that efficiently adapt to a never ending stream of data. Implementations have been made available at https://github.com/deepmind/dm_nevis.

Viaarxiv icon