Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dustin Tran

Implicit Causal Models for Genome-wide Association Studies

Oct 30, 2017
Dustin Tran, David M. Blei

Figure 1 for Implicit Causal Models for Genome-wide Association Studies

Figure 2 for Implicit Causal Models for Genome-wide Association Studies

Figure 3 for Implicit Causal Models for Genome-wide Association Studies

Progress in probabilistic generative models has accelerated, developing richer models with neural architectures, implicit densities, and with scalable algorithms for their Bayesian inference. However, there has been limited progress in models that capture causal relationships, for example, how individual genetic factors cause major human diseases. In this work, we focus on two challenges in particular: How do we build richer causal models, which can capture highly nonlinear relationships and interactions between multiple causes? How do we adjust for latent confounders, which are variables influencing both cause and effect and which prevent learning of causal relationships? To address these challenges, we synthesize ideas from causality and modern probabilistic modeling. For the first, we describe implicit causal models, a class of causal models that leverages neural architectures with an implicit density. For the second, we describe an implicit causal model that adjusts for confounders by sharing strength across examples. In experiments, we scale Bayesian inference on up to a billion genetic measurements. We achieve state of the art accuracy for identifying causal factors: we significantly outperform existing genetics methods by an absolute difference of 15-45.3%.

Via

Access Paper or Ask Questions

Deep Probabilistic Programming

Mar 07, 2017
Dustin Tran, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy, David M. Blei

Figure 1 for Deep Probabilistic Programming

Figure 2 for Deep Probabilistic Programming

Figure 3 for Deep Probabilistic Programming

Figure 4 for Deep Probabilistic Programming

We propose Edward, a Turing-complete probabilistic programming language. Edward defines two compositional representations---random variables and inference. By treating inference as a first class citizen, on a par with modeling, we show that probabilistic programming can be as flexible and computationally efficient as traditional deep learning. For flexibility, Edward makes it easy to fit the same model using a variety of composable inference methods, ranging from point estimation to variational inference to MCMC. In addition, Edward can reuse the modeling representation as part of inference, facilitating the design of rich variational models and generative adversarial networks. For efficiency, Edward is integrated into TensorFlow, providing significant speedups over existing probabilistic systems. For example, we show on a benchmark logistic regression task that Edward is at least 35x faster than Stan and 6x faster than PyMC3. Further, Edward incurs no runtime overhead: it is as fast as handwritten TensorFlow.

* Appears in International Conference on Learning Representations, 2017. A companion webpage for this paper is available at http://edwardlib.org/iclr2017

Via

Access Paper or Ask Questions

Edward: A library for probabilistic modeling, inference, and criticism

Feb 01, 2017
Dustin Tran, Alp Kucukelbir, Adji B. Dieng, Maja Rudolph, Dawen Liang, David M. Blei

Figure 1 for Edward: A library for probabilistic modeling, inference, and criticism

Figure 2 for Edward: A library for probabilistic modeling, inference, and criticism

Figure 3 for Edward: A library for probabilistic modeling, inference, and criticism

Figure 4 for Edward: A library for probabilistic modeling, inference, and criticism

Probabilistic modeling is a powerful approach for analyzing empirical information. We describe Edward, a library for probabilistic modeling. Edward's design reflects an iterative process pioneered by George Box: build a model of a phenomenon, make inferences about the model given data, and criticize the model's fit to the data. Edward supports a broad class of probabilistic models, efficient algorithms for inference, and many techniques for model criticism. The library builds on top of TensorFlow to support distributed training and hardware such as GPUs. Edward enables the development of complex probabilistic models and their algorithms at a massive scale.

Via

Access Paper or Ask Questions

Towards stability and optimality in stochastic gradient descent

Jun 07, 2016
Panos Toulis, Dustin Tran, Edoardo M. Airoldi

Figure 1 for Towards stability and optimality in stochastic gradient descent

Iterative procedures for parameter estimation based on stochastic gradient descent allow the estimation to scale to massive data sets. However, in both theory and practice, they suffer from numerical instability. Moreover, they are statistically inefficient as estimators of the true parameter value. To address these two issues, we propose a new iterative procedure termed averaged implicit SGD (AI-SGD). For statistical efficiency, AI-SGD employs averaging of the iterates, which achieves the optimal Cram\'{e}r-Rao bound under strong convexity, i.e., it is an optimal unbiased estimator of the true parameter value. For numerical stability, AI-SGD employs an implicit update at each iteration, which is related to proximal operators in optimization. In practice, AI-SGD achieves competitive performance with other state-of-the-art procedures. Furthermore, it is more stable than averaging procedures that do not employ proximal updates, and is simple to implement as it requires fewer tunable hyperparameters than procedures that do employ proximal updates.

* Appears in Artificial Intelligence and Statistics, 2016

Via

Access Paper or Ask Questions

Hierarchical Variational Models

May 30, 2016
Rajesh Ranganath, Dustin Tran, David M. Blei

Figure 1 for Hierarchical Variational Models

Figure 2 for Hierarchical Variational Models

Figure 3 for Hierarchical Variational Models

Figure 4 for Hierarchical Variational Models

Black box variational inference allows researchers to easily prototype and evaluate an array of models. Recent advances allow such algorithms to scale to high dimensions. However, a central question remains: How to specify an expressive variational distribution that maintains efficient computation? To address this, we develop hierarchical variational models (HVMs). HVMs augment a variational approximation with a prior on its parameters, which allows it to capture complex structure for both discrete and continuous latent variables. The algorithm we develop is black box, can be used for any HVM, and has the same computational efficiency as the original approximation. We study HVMs on a variety of deep discrete latent variable models. HVMs generalize other expressive variational distributions and maintains higher fidelity to the posterior.

* Appears in International Conference on Machine Learning, 2016

Via

Access Paper or Ask Questions

The Variational Gaussian Process

Apr 17, 2016
Dustin Tran, Rajesh Ranganath, David M. Blei

Figure 1 for The Variational Gaussian Process

Figure 2 for The Variational Gaussian Process

Figure 3 for The Variational Gaussian Process

Figure 4 for The Variational Gaussian Process

Variational inference is a powerful tool for approximate inference, and it has been recently applied for representation learning with deep generative models. We develop the variational Gaussian process (VGP), a Bayesian nonparametric variational family, which adapts its shape to match complex posterior distributions. The VGP generates approximate posterior samples by generating latent inputs and warping them through random non-linear mappings; the distribution over random mappings is learned during inference, enabling the transformed outputs to adapt to varying complexity. We prove a universal approximation theorem for the VGP, demonstrating its representative power for learning any model. For inference we present a variational objective inspired by auto-encoders and perform black box inference over a wide class of models. The VGP achieves new state-of-the-art results for unsupervised learning, inferring models such as the deep latent Gaussian model and the recently proposed DRAW.

* Appears in International Conference on Learning Representations, 2016

Via

Access Paper or Ask Questions

Spectral M-estimation with Applications to Hidden Markov Models

Mar 29, 2016
Dustin Tran, Minjae Kim, Finale Doshi-Velez

Figure 1 for Spectral M-estimation with Applications to Hidden Markov Models

Figure 2 for Spectral M-estimation with Applications to Hidden Markov Models

Figure 3 for Spectral M-estimation with Applications to Hidden Markov Models

Figure 4 for Spectral M-estimation with Applications to Hidden Markov Models

Method of moment estimators exhibit appealing statistical properties, such as asymptotic unbiasedness, for nonconvex problems. However, they typically require a large number of samples and are extremely sensitive to model misspecification. In this paper, we apply the framework of M-estimation to develop both a generalized method of moments procedure and a principled method for regularization. Our proposed M-estimator obtains optimal sample efficiency rates (in the class of moment-based estimators) and the same well-known rates on prediction accuracy as other spectral estimators. It also makes it straightforward to incorporate regularization into the sample moment conditions. We demonstrate empirically the gains in sample efficiency from our approach on hidden Markov models.

* Appears in Artificial Intelligence and Statistics, 2016

Via

Access Paper or Ask Questions

Automatic Differentiation Variational Inference

Mar 02, 2016
Alp Kucukelbir, Dustin Tran, Rajesh Ranganath, Andrew Gelman, David M. Blei

Figure 1 for Automatic Differentiation Variational Inference

Figure 2 for Automatic Differentiation Variational Inference

Figure 3 for Automatic Differentiation Variational Inference

Figure 4 for Automatic Differentiation Variational Inference

Probabilistic modeling is iterative. A scientist posits a simple model, fits it to her data, refines it according to her analysis, and repeats. However, fitting complex models to large data is a bottleneck in this process. Deriving algorithms for new models can be both mathematically and computationally challenging, which makes it difficult to efficiently cycle through the steps. To this end, we develop automatic differentiation variational inference (ADVI). Using our method, the scientist only provides a probabilistic model and a dataset, nothing else. ADVI automatically derives an efficient variational inference algorithm, freeing the scientist to refine and explore many models. ADVI supports a broad class of models-no conjugacy assumptions are required. We study ADVI across ten different models and apply it to a dataset with millions of observations. ADVI is integrated into Stan, a probabilistic programming system; it is available for immediate use.

Via

Access Paper or Ask Questions

Copula variational inference

Oct 31, 2015
Dustin Tran, David M. Blei, Edoardo M. Airoldi

Figure 1 for Copula variational inference

Figure 2 for Copula variational inference

Figure 3 for Copula variational inference

Figure 4 for Copula variational inference

We develop a general variational inference method that preserves dependency among the latent variables. Our method uses copulas to augment the families of distributions used in mean-field and structured approximations. Copulas model the dependency that is not captured by the original variational distribution, and thus the augmented variational family guarantees better approximations to the posterior. With stochastic optimization, inference on the augmented distribution is scalable. Furthermore, our strategy is generic: it can be applied to any inference procedure that currently uses the mean-field or structured approach. Copula variational inference has many advantages: it reduces bias; it is less sensitive to local optima; it is less sensitive to hyperparameters; and it helps characterize and interpret the dependency among the latent variables.

* Appears in Neural Information Processing Systems, 2015

Via

Access Paper or Ask Questions

Stochastic gradient descent methods for estimation with large data sets

Sep 22, 2015
Dustin Tran, Panos Toulis, Edoardo M. Airoldi

Figure 1 for Stochastic gradient descent methods for estimation with large data sets

Figure 2 for Stochastic gradient descent methods for estimation with large data sets

Figure 3 for Stochastic gradient descent methods for estimation with large data sets

Figure 4 for Stochastic gradient descent methods for estimation with large data sets

We develop methods for parameter estimation in settings with large-scale data sets, where traditional methods are no longer tenable. Our methods rely on stochastic approximations, which are computationally efficient as they maintain one iterate as a parameter estimate, and successively update that iterate based on a single data point. When the update is based on a noisy gradient, the stochastic approximation is known as standard stochastic gradient descent, which has been fundamental in modern applications with large data sets. Additionally, our methods are numerically stable because they employ implicit updates of the iterates. Intuitively, an implicit update is a shrinked version of a standard one, where the shrinkage factor depends on the observed Fisher information at the corresponding data point. This shrinkage prevents numerical divergence of the iterates, which can be caused either by excess noise or outliers. Our sgd package in R offers the most extensive and robust implementation of stochastic gradient descent methods. We demonstrate that sgd dominates alternative software in runtime for several estimation problems with massive data sets. Our applications include the wide class of generalized linear models as well as M-estimation for robust regression.

Via

Access Paper or Ask Questions