Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Frank Wood

Online Learning Rate Adaptation with Hypergradient Descent

Feb 26, 2018
Atilim Gunes Baydin, Robert Cornish, David Martinez Rubio, Mark Schmidt, Frank Wood

Figure 1 for Online Learning Rate Adaptation with Hypergradient Descent

Figure 2 for Online Learning Rate Adaptation with Hypergradient Descent

Figure 3 for Online Learning Rate Adaptation with Hypergradient Descent

Figure 4 for Online Learning Rate Adaptation with Hypergradient Descent

We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in a range of optimization problems by applying it to stochastic gradient descent, stochastic gradient descent with Nesterov momentum, and Adam, showing that it significantly reduces the need for the manual tuning of the initial learning rate for these commonly used algorithms. Our method works by dynamically updating the learning rate during optimization using the gradient with respect to the learning rate of the update rule itself. Computing this "hypergradient" needs little additional computation, requires only one extra copy of the original gradient to be stored in memory, and relies upon nothing more than what is provided by reverse-mode automatic differentiation.

* In Sixth International Conference on Learning Representations (ICLR), Vancouver, Canada, April 30 -- May 3, 2018. https://openreview.net/forum?id=BkrsAzWAb
* 11 pages, 4 figures

Via

Access Paper or Ask Questions

Improvements to Inference Compilation for Probabilistic Programming in Large-Scale Scientific Simulators

Dec 21, 2017
Mario Lezcano Casado, Atilim Gunes Baydin, David Martinez Rubio, Tuan Anh Le, Frank Wood, Lukas Heinrich, Gilles Louppe, Kyle Cranmer, Karen Ng, Wahid Bhimji, Prabhat

Figure 1 for Improvements to Inference Compilation for Probabilistic Programming in Large-Scale Scientific Simulators

Figure 2 for Improvements to Inference Compilation for Probabilistic Programming in Large-Scale Scientific Simulators

We consider the problem of Bayesian inference in the family of probabilistic models implicitly defined by stochastic generative models of data. In scientific fields ranging from population biology to cosmology, low-level mechanistic components are composed to create complex generative models. These models lead to intractable likelihoods and are typically non-differentiable, which poses challenges for traditional approaches to inference. We extend previous work in "inference compilation", which combines universal probabilistic programming and deep learning methods, to large-scale scientific simulators, and introduce a C++ based probabilistic programming library called CPProb. We successfully use CPProb to interface with SHERPA, a large code-base used in particle physics. Here we describe the technical innovations realized and planned for this library.

* 7 pages, 2 figures

Via

Access Paper or Ask Questions

Learning Disentangled Representations with Semi-Supervised Deep Generative Models

Nov 13, 2017
N. Siddharth, Brooks Paige, Jan-Willem van de Meent, Alban Desmaison, Noah D. Goodman, Pushmeet Kohli, Frank Wood, Philip H. S. Torr

Figure 1 for Learning Disentangled Representations with Semi-Supervised Deep Generative Models

Figure 2 for Learning Disentangled Representations with Semi-Supervised Deep Generative Models

Figure 3 for Learning Disentangled Representations with Semi-Supervised Deep Generative Models

Figure 4 for Learning Disentangled Representations with Semi-Supervised Deep Generative Models

Variational autoencoders (VAEs) learn representations of data by jointly training a probabilistic encoder and decoder network. Typically these models encode all features of the data into a single variable. Here we are interested in learning disentangled representations that encode distinct aspects of the data into separate variables. We propose to learn such representations using model architectures that generalise from standard VAEs, employing a general graphical model structure in the encoder and decoder. This allows us to train partially-specified models that make relatively strong assumptions about a subset of interpretable variables and rely on the flexibility of neural networks to learn representations for the remaining variables. We further define a general objective for semi-supervised learning in this model class, which can be approximated using an importance sampling procedure. We evaluate our framework's ability to learn disentangled representations, both by qualitative exploration of its generative capacity, and quantitative evaluation of its discriminative ability on a variety of models and datasets.

* Accepted for publication at NIPS 2017

Via

Access Paper or Ask Questions

Updating the VESICLE-CNN Synapse Detector

Oct 31, 2017
Andrew Warrington, Frank Wood

Figure 1 for Updating the VESICLE-CNN Synapse Detector

Figure 2 for Updating the VESICLE-CNN Synapse Detector

We present an updated version of the VESICLE-CNN algorithm presented by Roncal et al. (2014). The original implementation makes use of a patch-based approach. This methodology is known to be slow due to repeated computations. We update this implementation to be fully convolutional through the use of dilated convolutions, recovering the expanded field of view achieved through the use of strided maxpools, but without a degradation of spatial resolution. This updated implementation performs as well as the original implementation, but with a $600\times$ speedup at test time. We release source code and data into the public domain.

* Submitted as two side extended abstract to NIPS 2017 workshop: BigNeuro 2017: Analyzing brain data from nano to macroscale

Via

Access Paper or Ask Questions

Canonical Correlation Forests

Aug 09, 2017
Tom Rainforth, Frank Wood

Figure 1 for Canonical Correlation Forests

Figure 2 for Canonical Correlation Forests

Figure 3 for Canonical Correlation Forests

Figure 4 for Canonical Correlation Forests

We introduce canonical correlation forests (CCFs), a new decision tree ensemble method for classification and regression. Individual canonical correlation trees are binary decision trees with hyperplane splits based on local canonical correlation coefficients calculated during training. Unlike axis-aligned alternatives, the decision surfaces of CCFs are not restricted to the coordinate system of the inputs features and therefore more naturally represent data with correlated inputs. CCFs naturally accommodate multiple outputs, provide a similar computational complexity to random forests, and inherit their impressive robustness to the choice of input parameters. As part of the CCF training algorithm, we also introduce projection bootstrapping, a novel alternative to bagging for oblique decision tree ensembles which maintains use of the full dataset in selecting split points, often leading to improvements in predictive accuracy. Our experiments show that, even without parameter tuning, CCFs out-perform axis-aligned random forests and other state-of-the-art tree ensemble methods on both classification and regression problems, delivering both improved predictive accuracy and faster training times. We further show that they outperform all of the 179 classifiers considered in a recent extensive survey.

* Substantial update: longer journal format version which now covers regression and multiple output prediction

Via

Access Paper or Ask Questions

Bayesian Optimization for Probabilistic Programs

Jul 13, 2017
Tom Rainforth, Tuan Anh Le, Jan-Willem van de Meent, Michael A. Osborne, Frank Wood

Figure 1 for Bayesian Optimization for Probabilistic Programs

Figure 2 for Bayesian Optimization for Probabilistic Programs

Figure 3 for Bayesian Optimization for Probabilistic Programs

We present the first general purpose framework for marginal maximum a posteriori estimation of probabilistic program variables. By using a series of code transformations, the evidence of any probabilistic program, and therefore of any graphical model, can be optimized with respect to an arbitrary subset of its sampled variables. To carry out this optimization, we develop the first Bayesian optimization package to directly exploit the source code of its target, leading to innovations in problem-independent hyperpriors, unbounded optimization, and implicit constraint satisfaction; delivering significant performance improvements over prominent existing packages. We present applications of our method to a number of tasks including engineering design and parameter optimization.

Via

Access Paper or Ask Questions

Interacting Particle Markov Chain Monte Carlo

Apr 12, 2017
Tom Rainforth, Christian A. Naesseth, Fredrik Lindsten, Brooks Paige, Jan-Willem van de Meent, Arnaud Doucet, Frank Wood

Figure 1 for Interacting Particle Markov Chain Monte Carlo

Figure 2 for Interacting Particle Markov Chain Monte Carlo

Figure 3 for Interacting Particle Markov Chain Monte Carlo

Figure 4 for Interacting Particle Markov Chain Monte Carlo

We introduce interacting particle Markov chain Monte Carlo (iPMCMC), a PMCMC method based on an interacting pool of standard and conditional sequential Monte Carlo samplers. Like related methods, iPMCMC is a Markov chain Monte Carlo sampler on an extended space. We present empirical results that show significant improvements in mixing rates relative to both non-interacting PMCMC samplers, and a single PMCMC sampler with an equivalent memory and computational budget. An additional advantage of the iPMCMC method is that it is suitable for distributed and multi-core architectures.

* JMLR W&CP 48 : 2616-2625, 2016

Via

Access Paper or Ask Questions

Using Synthetic Data to Train Neural Networks is Model-Based Reasoning

Mar 02, 2017
Tuan Anh Le, Atilim Gunes Baydin, Robert Zinkov, Frank Wood

Figure 1 for Using Synthetic Data to Train Neural Networks is Model-Based Reasoning

Figure 2 for Using Synthetic Data to Train Neural Networks is Model-Based Reasoning

Figure 3 for Using Synthetic Data to Train Neural Networks is Model-Based Reasoning

Figure 4 for Using Synthetic Data to Train Neural Networks is Model-Based Reasoning

We draw a formal connection between using synthetic training data to optimize neural network parameters and approximate, Bayesian, model-based reasoning. In particular, training a neural network using synthetic data can be viewed as learning a proposal distribution generator for approximate inference in the synthetic-data generative model. We demonstrate this connection in a recognition task where we develop a novel Captcha-breaking architecture and train it using synthetic data, demonstrating both state-of-the-art performance and a way of computing task-specific posterior uncertainty. Using a neural network trained this way, we also demonstrate successful breaking of real-world Captchas currently used by Facebook and Wikipedia. Reasoning from these empirical results and drawing connections with Bayesian modeling, we discuss the robustness of synthetic data results and suggest important considerations for ensuring good neural network generalization when training with synthetic data.

* 8 pages, 4 figures

Via

Access Paper or Ask Questions

Inference Compilation and Universal Probabilistic Programming

Mar 02, 2017
Tuan Anh Le, Atilim Gunes Baydin, Frank Wood

Figure 1 for Inference Compilation and Universal Probabilistic Programming

Figure 2 for Inference Compilation and Universal Probabilistic Programming

Figure 3 for Inference Compilation and Universal Probabilistic Programming

Figure 4 for Inference Compilation and Universal Probabilistic Programming

We introduce a method for using deep neural networks to amortize the cost of inference in models from the family induced by universal probabilistic programming languages, establishing a framework that combines the strengths of probabilistic programming and deep learning methods. We call what we do "compilation of inference" because our method transforms a denotational specification of an inference problem in the form of a probabilistic program written in a universal programming language into a trained neural network denoted in a neural network specification language. When at test time this neural network is fed observational data and executed, it performs approximate inference in the original model specified by the probabilistic program. Our training objective and learning procedure are designed to allow the trained neural network to be used as a proposal distribution in a sequential importance sampling inference engine. We illustrate our method on mixture models and Captcha solving and show significant speedups in the efficiency of inference.

* In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS 2017), 54:1338--1348. Proceedings of Machine Learning Research. Fort Lauderdale, FL, USA: PMLR
* 11 pages, 6 figures

Via

Access Paper or Ask Questions

On the Pitfalls of Nested Monte Carlo

Dec 03, 2016
Tom Rainforth, Robert Cornish, Hongseok Yang, Frank Wood

Figure 1 for On the Pitfalls of Nested Monte Carlo

There is an increasing interest in estimating expectations outside of the classical inference framework, such as for models expressed as probabilistic programs. Many of these contexts call for some form of nested inference to be applied. In this paper, we analyse the behaviour of nested Monte Carlo (NMC) schemes, for which classical convergence proofs are insufficient. We give conditions under which NMC will converge, establish a rate of convergence, and provide empirical data that suggests that this rate is observable in practice. Finally, we prove that general-purpose nested inference schemes are inherently biased. Our results serve to warn of the dangers associated with naive composition of inference and models.

* Appearing in NIPS Workshop on Advances in Approximate Bayesian Inference 2016

Via

Access Paper or Ask Questions