Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Frank Wood

Annealed Importance Sampling with q-Paths

Dec 14, 2020

Rob Brekelmans, Vaden Masrani, Thang Bui, Frank Wood, Aram Galstyan, Greg Ver Steeg, Frank Nielsen

Figure 1 for Annealed Importance Sampling with q-Paths

Figure 2 for Annealed Importance Sampling with q-Paths

Figure 3 for Annealed Importance Sampling with q-Paths

Figure 4 for Annealed Importance Sampling with q-Paths

Abstract:Annealed importance sampling (AIS) is the gold standard for estimating partition functions or marginal likelihoods, corresponding to importance sampling over a path of distributions between a tractable base and an unnormalized target. While AIS yields an unbiased estimator for any path, existing literature has been primarily limited to the geometric mixture or moment-averaged paths associated with the exponential family and KL divergence. We explore AIS using $q$-paths, which include the geometric path as a special case and are related to the homogeneous power mean, deformed exponential family, and $\alpha$-divergence.

* NeurIPS Workshop on Deep Learning through Information Geometry (Best Paper Award)

Via

Access Paper or Ask Questions

Ensemble Squared: A Meta AutoML System

Dec 10, 2020

Jason Yoo, Tony Joseph, Dylan Yung, S. Ali Nasseri, Frank Wood

Figure 1 for Ensemble Squared: A Meta AutoML System

Figure 2 for Ensemble Squared: A Meta AutoML System

Figure 3 for Ensemble Squared: A Meta AutoML System

Figure 4 for Ensemble Squared: A Meta AutoML System

Abstract:The continuing rise in the number of problems amenable to machine learning solutions, coupled with simultaneous growth in both computing power and variety of machine learning techniques has led to an explosion of interest in automated machine learning (AutoML). This paper presents Ensemble Squared (Ensemble$^2$), a "meta" AutoML system that ensembles at the level of AutoML systems. Ensemble$^2$ exploits the diversity of existing, competing AutoML systems by ensembling the top-performing models simultaneously generated by a set of them. Our work shows that diversity in AutoML systems is sufficient to justify ensembling at the AutoML system level. In demonstrating this, we also establish a new state of the art AutoML result on the OpenML classification challenge.

Via

Access Paper or Ask Questions

Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective

Oct 31, 2020

Vu Nguyen, Vaden Masrani, Rob Brekelmans, Michael A. Osborne, Frank Wood

Figure 1 for Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective

Figure 2 for Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective

Figure 3 for Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective

Figure 4 for Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective

Abstract:Achieving the full promise of the Thermodynamic Variational Objective (TVO), a recently proposed variational lower bound on the log evidence involving a one-dimensional Riemann integral approximation, requires choosing a "schedule" of sorted discretization points. This paper introduces a bespoke Gaussian process bandit optimization method for automatically choosing these points. Our approach not only automates their one-time selection, but also dynamically adapts their positions over the course of optimization, leading to improved model learning and inference. We provide theoretical guarantees that our bandit optimization converges to the regret-minimizing choice of integration points. Empirical validation of our algorithm is provided in terms of improved learning and inference in Variational Autoencoders and Sigmoid Belief Networks.

* NeurIPS 2020

Via

Access Paper or Ask Questions

Uncertainty in Neural Processes

Oct 08, 2020

Saeid Naderiparizi, Kenny Chiu, Benjamin Bloem-Reddy, Frank Wood

Figure 1 for Uncertainty in Neural Processes

Figure 2 for Uncertainty in Neural Processes

Figure 3 for Uncertainty in Neural Processes

Figure 4 for Uncertainty in Neural Processes

Abstract:We explore the effects of architecture and training objective choice on amortized posterior predictive inference in probabilistic conditional generative models. We aim this work to be a counterpoint to a recent trend in the literature that stresses achieving good samples when the amount of conditioning data is large. We instead focus our attention on the case where the amount of conditioning data is small. We highlight specific architecture and objective choices that we find lead to qualitative and quantitative improvement to posterior inference in this low data regime. Specifically we explore the effects of choices of pooling operator and variational family on posterior quality in neural processes. Superior posterior predictive samples drawn from our novel neural process architectures are demonstrated via image completion/in-painting experiments.

Via

Access Paper or Ask Questions

Assisting the Adversary to Improve GAN Training

Oct 03, 2020

Andreas Munk, William Harvey, Frank Wood

Figure 1 for Assisting the Adversary to Improve GAN Training

Figure 2 for Assisting the Adversary to Improve GAN Training

Figure 3 for Assisting the Adversary to Improve GAN Training

Figure 4 for Assisting the Adversary to Improve GAN Training

Abstract:We propose a method for improved training of generative adversarial networks (GANs). Some of the most popular methods for improving the stability and performance of GANs involve constraining or regularizing the discriminator. Our method, on the other hand, involves regularizing the generator. It can be used alongside existing approaches to GAN training and is simple and straightforward to implement. Our method is motivated by a common mismatch between theoretical analysis and practice: analysis often assumes that the discriminator reaches its optimum on each iteration. In practice, this is essentially never true, often leading to poor gradient estimates for the generator. To address this, we introduce the Adversary's Assistant (AdvAs). It is a theoretically motivated penalty imposed on the generator based on the norm of the gradients used to train the discriminator. This encourages the generator to move towards points where the discriminator is optimal. We demonstrate the effect of applying AdvAs to several GAN objectives, datasets and network architectures. The results indicate a reduction in the mismatch between theory and practice and that AdvAs can lead to improvement of GAN training, as measured by FID scores.

Via

Access Paper or Ask Questions

All in the Exponential Family: Bregman Duality in Thermodynamic Variational Inference

Jul 01, 2020

Rob Brekelmans, Vaden Masrani, Frank Wood, Greg Ver Steeg, Aram Galstyan

Figure 1 for All in the Exponential Family: Bregman Duality in Thermodynamic Variational Inference

Figure 2 for All in the Exponential Family: Bregman Duality in Thermodynamic Variational Inference

Figure 3 for All in the Exponential Family: Bregman Duality in Thermodynamic Variational Inference

Figure 4 for All in the Exponential Family: Bregman Duality in Thermodynamic Variational Inference

Abstract:The recently proposed Thermodynamic Variational Objective (TVO) leverages thermodynamic integration to provide a family of variational inference objectives, which both tighten and generalize the ubiquitous Evidence Lower Bound (ELBO). However, the tightness of TVO bounds was not previously known, an expensive grid search was used to choose a "schedule" of intermediate distributions, and model learning suffered with ostensibly tighter bounds. In this work, we propose an exponential family interpretation of the geometric mixture curve underlying the TVO and various path sampling methods, which allows us to characterize the gap in TVO likelihood bounds as a sum of KL divergences. We propose to choose intermediate distributions using equal spacing in the moment parameters of our exponential family, which matches grid search performance and allows the schedule to adaptively update over the course of training. Finally, we derive a doubly reparameterized gradient estimator which improves model learning and allows the TVO to benefit from more refined bounds. To further contextualize our contributions, we provide a unified framework for understanding thermodynamic integration and the TVO using Taylor series remainders.

* ICML 2020

Via

Access Paper or Ask Questions

Semi-supervised Sequential Generative Models

Jun 30, 2020

Michael Teng, Tuan Anh Le, Adam Scibior, Frank Wood

Figure 1 for Semi-supervised Sequential Generative Models

Figure 2 for Semi-supervised Sequential Generative Models

Figure 3 for Semi-supervised Sequential Generative Models

Figure 4 for Semi-supervised Sequential Generative Models

Abstract:We introduce a novel objective for training deep generative time-series models with discrete latent variables for which supervision is only sparsely available. This instance of semi-supervised learning is challenging for existing methods, because the exponential number of possible discrete latent configurations results in high variance gradient estimators. We first overcome this problem by extending the standard semi-supervised generative modeling objective with reweighted wake-sleep. However, we find that this approach still suffers when the frequency of available labels varies between training sequences. Finally, we introduce a unified objective inspired by teacher-forcing and show that this approach is robust to variable length supervision. We call the resulting method caffeinated wake-sleep (CWS) to emphasize its additional dependence on real data. We demonstrate its effectiveness with experiments on MNIST, handwriting, and fruit fly trajectory data.

* Accepted to Uncertainty in Artificial Intelligence 2020

Via

Access Paper or Ask Questions

Improving Few-Shot Visual Classification with Unlabelled Examples

Jun 17, 2020

Peyman Bateni, Jarred Barber, Jan-Willem van de Meent, Frank Wood

Figure 1 for Improving Few-Shot Visual Classification with Unlabelled Examples

Figure 2 for Improving Few-Shot Visual Classification with Unlabelled Examples

Figure 3 for Improving Few-Shot Visual Classification with Unlabelled Examples

Figure 4 for Improving Few-Shot Visual Classification with Unlabelled Examples

Abstract:We propose a transductive meta-learning method that uses unlabelled instances to improve few-shot image classification performance. Our approach combines a regularized Mahalanobis-distance-based soft k-means clustering procedure with a state of the art neural adaptive feature extractor to achieve improved test-time classification accuracy using unlabelled data. We evaluate our method on transductive few-shot learning tasks, in which the goal is to jointly predict labels for query (test) examples given a set of support (training) examples. We achieve new state of the art in-domain performance on Meta-Dataset, and improve accuracy on mini- and tiered-ImageNet as compared to other conditional neural adaptive methods that use the same pre-trained feature extractor.

Via

Access Paper or Ask Questions

Planning as Inference in Epidemiological Models

Apr 03, 2020

Frank Wood, Andrew Warrington, Saeid Naderiparizi, Christian Weilbach, Vaden Masrani, William Harvey, Adam Scibior, Boyan Beronov, Ali Nasseri

Figure 1 for Planning as Inference in Epidemiological Models

Figure 2 for Planning as Inference in Epidemiological Models

Figure 3 for Planning as Inference in Epidemiological Models

Figure 4 for Planning as Inference in Epidemiological Models

Abstract:In this work we demonstrate how existing software tools can be used to automate parts of infectious disease-control policy-making via performing inference in existing epidemiological dynamics models. The kind of inference tasks undertaken include computing, for planning purposes, the posterior distribution over putatively controllable, via direct policy-making choices, simulation model parameters that give rise to acceptable disease progression outcomes. Neither the full capabilities of such inference automation software tools nor their utility for planning is widely disseminated at the current time. Timely gains in understanding about these tools and how they can be used may lead to more fine-grained and less economically damaging policy prescriptions, particularly during the current COVID-19 pandemic.

* minor typos corrected

Via

Access Paper or Ask Questions

Coping With Simulators That Don't Always Return

Mar 28, 2020

Andrew Warrington, Saeid Naderiparizi, Frank Wood

Figure 1 for Coping With Simulators That Don't Always Return

Figure 2 for Coping With Simulators That Don't Always Return

Figure 3 for Coping With Simulators That Don't Always Return

Figure 4 for Coping With Simulators That Don't Always Return

Abstract:Deterministic models are approximations of reality that are easy to interpret and often easier to build than stochastic alternatives. Unfortunately, as nature is capricious, observational data can never be fully explained by deterministic models in practice. Observation and process noise need to be added to adapt deterministic models to behave stochastically, such that they are capable of explaining and extrapolating from noisy data. We investigate and address computational inefficiencies that arise from adding process noise to deterministic simulators that fail to return for certain inputs; a property we describe as "brittle." We show how to train a conditional normalizing flow to propose perturbations such that the simulator succeeds with high probability, increasing computational efficiency.

* AISTATS 2020 camera ready, version 1.0

Via

Access Paper or Ask Questions