Alert button
Picture for Srinivas Vasudevan

Srinivas Vasudevan

Alert button

Learning Energy-based Model with Flow-based Backbone by Neural Transport MCMC

Jun 12, 2020
Erik Nijkamp, Ruiqi Gao, Pavel Sountsov, Srinivas Vasudevan, Bo Pang, Song-Chun Zhu, Ying Nian Wu

Figure 1 for Learning Energy-based Model with Flow-based Backbone by Neural Transport MCMC
Figure 2 for Learning Energy-based Model with Flow-based Backbone by Neural Transport MCMC
Figure 3 for Learning Energy-based Model with Flow-based Backbone by Neural Transport MCMC
Figure 4 for Learning Energy-based Model with Flow-based Backbone by Neural Transport MCMC

Learning energy-based model (EBM) requires MCMC sampling of the learned model as the inner loop of the learning algorithm. However, MCMC sampling of EBM in data space is generally not mixing, because the energy function, which is usually parametrized by deep network, is highly multi-modal in the data space. This is a serious handicap for both the theory and practice of EBM. In this paper, we propose to learn EBM with a flow-based model serving as a backbone, so that the EBM is a correction or an exponential tilting of the flow-based model. We show that the model has a particularly simple form in the space of the latent variables of the flow-based model, and MCMC sampling of the EBM in the latent space, which is a simple special case of neural transport MCMC, mixes well and traverses modes in the data space. This enables proper sampling and learning of EBM.

Viaarxiv icon

NeuTra-lizing Bad Geometry in Hamiltonian Monte Carlo Using Neural Transport

Mar 09, 2019
Matthew Hoffman, Pavel Sountsov, Joshua V. Dillon, Ian Langmore, Dustin Tran, Srinivas Vasudevan

Figure 1 for NeuTra-lizing Bad Geometry in Hamiltonian Monte Carlo Using Neural Transport
Figure 2 for NeuTra-lizing Bad Geometry in Hamiltonian Monte Carlo Using Neural Transport
Figure 3 for NeuTra-lizing Bad Geometry in Hamiltonian Monte Carlo Using Neural Transport
Figure 4 for NeuTra-lizing Bad Geometry in Hamiltonian Monte Carlo Using Neural Transport

Hamiltonian Monte Carlo is a powerful algorithm for sampling from difficult-to-normalize posterior distributions. However, when the geometry of the posterior is unfavorable, it may take many expensive evaluations of the target distribution and its gradient to converge and mix. We propose neural transport (NeuTra) HMC, a technique for learning to correct this sort of unfavorable geometry using inverse autoregressive flows (IAF), a powerful neural variational inference technique. The IAF is trained to minimize the KL divergence from an isotropic Gaussian to the warped posterior, and then HMC sampling is performed in the warped space. We evaluate NeuTra HMC on a variety of synthetic and real problems, and find that it significantly outperforms vanilla HMC both in time to reach the stationary distribution and asymptotic effective-sample-size rates.

Viaarxiv icon

Simple, Distributed, and Accelerated Probabilistic Programming

Nov 29, 2018
Dustin Tran, Matthew Hoffman, Dave Moore, Christopher Suter, Srinivas Vasudevan, Alexey Radul, Matthew Johnson, Rif A. Saurous

Figure 1 for Simple, Distributed, and Accelerated Probabilistic Programming
Figure 2 for Simple, Distributed, and Accelerated Probabilistic Programming
Figure 3 for Simple, Distributed, and Accelerated Probabilistic Programming
Figure 4 for Simple, Distributed, and Accelerated Probabilistic Programming

We describe a simple, low-level approach for embedding probabilistic programming in a deep learning ecosystem. In particular, we distill probabilistic programming down to a single abstraction---the random variable. Our lightweight implementation in TensorFlow enables numerous applications: a model-parallel variational auto-encoder (VAE) with 2nd-generation tensor processing units (TPUv2s); a data-parallel autoregressive model (Image Transformer) with TPUv2s; and multi-GPU No-U-Turn Sampler (NUTS). For both a state-of-the-art VAE on 64x64 ImageNet and Image Transformer on 256x256 CelebA-HQ, our approach achieves an optimal linear speedup from 1 to 256 TPUv2 chips. With NUTS, we see a 100x speedup on GPUs over Stan and 37x over PyMC3.

* Appears in Neural Information Processing Systems, 2018. Code available at http://bit.ly/2JpFipt 
Viaarxiv icon

TensorFlow Distributions

Nov 28, 2017
Joshua V. Dillon, Ian Langmore, Dustin Tran, Eugene Brevdo, Srinivas Vasudevan, Dave Moore, Brian Patton, Alex Alemi, Matt Hoffman, Rif A. Saurous

Figure 1 for TensorFlow Distributions
Figure 2 for TensorFlow Distributions
Figure 3 for TensorFlow Distributions

The TensorFlow Distributions library implements a vision of probability theory adapted to the modern deep-learning paradigm of end-to-end differentiable computation. Building on two basic abstractions, it offers flexible building blocks for probabilistic computation. Distributions provide fast, numerically stable methods for generating samples and computing statistics, e.g., log density. Bijectors provide composable volume-tracking transformations with automatic caching. Together these enable modular construction of high dimensional distributions and transformations not possible with previous libraries (e.g., pixelCNNs, autoregressive flows, and reversible residual networks). They are the workhorse behind deep probabilistic programming systems like Edward and empower fast black-box inference in probabilistic models built on deep-network components. TensorFlow Distributions has proven an important part of the TensorFlow toolkit within Google and in the broader deep learning community.

Viaarxiv icon