Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jan-Willem van de Meent

Evaluating Combinatorial Generalization in Variational Autoencoders

Nov 11, 2019

Alican Bozkurt, Babak Esmaeili, Dana H. Brooks, Jennifer G. Dy, Jan-Willem van de Meent

Figure 1 for Evaluating Combinatorial Generalization in Variational Autoencoders

Figure 2 for Evaluating Combinatorial Generalization in Variational Autoencoders

Figure 3 for Evaluating Combinatorial Generalization in Variational Autoencoders

Figure 4 for Evaluating Combinatorial Generalization in Variational Autoencoders

Abstract:We evaluate the ability of variational autoencoders to generalize to unseen examples in domains with a large combinatorial space of feature values. Our experiments systematically evaluate the effect of network width, depth, regularization, and the typical distance between the training and test examples. Increasing network capacity benefits generalization in easy problems, where test-set examples are similar to training examples. In more difficult problems, increasing capacity deteriorates generalization when optimizing the standard VAE objective, but once again improves generalization when we decrease the KL regularization. Our results establish that interplay between model capacity and KL regularization is not clear cut; we need to take the typical distance between train and test examples into account when evaluating generalization.

Via

Access Paper or Ask Questions

Amortized Population Gibbs Samplers with Neural Sufficient Statistics

Nov 04, 2019

Hao Wu, Heiko Zimmermann, Eli Sennesh, Tuan Anh Le, Jan-Willem van de Meent

Figure 1 for Amortized Population Gibbs Samplers with Neural Sufficient Statistics

Figure 2 for Amortized Population Gibbs Samplers with Neural Sufficient Statistics

Figure 3 for Amortized Population Gibbs Samplers with Neural Sufficient Statistics

Figure 4 for Amortized Population Gibbs Samplers with Neural Sufficient Statistics

Abstract:We develop amortized population Gibbs (APG) samplers, a new class of autoencoding variational methods for deep probabilistic models. APG samplers construct high-dimensional proposals by iterating over updates to lower-dimensional blocks of variables. Each conditional update is a neural proposal, which we train by minimizing the inclusive KL divergence relative to the conditional posterior. To appropriately account for the size of the input data, we develop a new parameterization in terms of neural sufficient statistics, resulting in quasi-conjugate variational approximations. Experiments demonstrate that learned proposals converge to the known analytical conditional posterior in conjugate models, and that APG samplers can learn inference networks for highly-structured deep generative models when the conditional posteriors are intractable. Here APG samplers offer a path toward scaling up stochastic variational methods to models in which standard autoencoding architectures fail to produce accurate samples.

Via

Access Paper or Ask Questions

Neural Topographic Factor Analysis for fMRI Data

Jun 21, 2019

Eli Sennesh, Zulqarnain Khan, Jennifer Dy, Ajay B. Satpute, J. Benjamin Hutchinson, Jan-Willem van de Meent

Figure 1 for Neural Topographic Factor Analysis for fMRI Data

Figure 2 for Neural Topographic Factor Analysis for fMRI Data

Figure 3 for Neural Topographic Factor Analysis for fMRI Data

Figure 4 for Neural Topographic Factor Analysis for fMRI Data

Abstract:Neuroimaging experiments produce a large volume (gigabytes) of high-dimensional spatio-temporal data for a small number of sampled participants and stimuli. Analyses of this data commonly compute averages over all trials, ignoring variation within groups of participants and stimuli. To enable the analysis of fMRI data without this implicit assumption of uniformity, we propose Neural Topographic Factor Analysis (NTFA), a deep generative model that parameterizes factors as functions of embeddings for participants and stimuli. We evaluate NTFA on a synthetically generated dataset as well as on three datasets from fMRI experiments. Our results demonstrate that NTFA yields more accurate reconstructions than a state-of-the-art method with fewer parameters. Moreover, learned embeddings uncover latent categories of participants and stimuli, which suggests that NTFA takes a first step towards reasoning about individual variation in fMRI experiments.

Via

Access Paper or Ask Questions

Structured Neural Topic Models for Reviews

Jan 02, 2019

Babak Esmaeili, Hongyi Huang, Byron C. Wallace, Jan-Willem van de Meent

Figure 1 for Structured Neural Topic Models for Reviews

Figure 2 for Structured Neural Topic Models for Reviews

Figure 3 for Structured Neural Topic Models for Reviews

Figure 4 for Structured Neural Topic Models for Reviews

Abstract:We present Variational Aspect-based Latent Topic Allocation (VALTA), a family of autoencoding topic models that learn aspect-based representations of reviews. VALTA defines a user-item encoder that maps bag-of-words vectors for combined reviews associated with each paired user and item onto structured embeddings, which in turn define per-aspect topic weights. We model individual reviews in a structured manner by inferring an aspect assignment for each sentence in a given review, where the per-aspect topic weights obtained by the user-item encoder serve to define a mixture over topics, conditioned on the aspect. The result is an autoencoding neural topic model for reviews, which can be trained in a fully unsupervised manner to learn topics that are structured into aspects. Experimental evaluation on large number of datasets demonstrates that aspects are interpretable, yield higher coherence scores than non-structured autoencoding topic model variants, and can be utilized to perform aspect-based comparison and genre discovery.

Via

Access Paper or Ask Questions

Can VAEs Generate Novel Examples?

Dec 22, 2018

Alican Bozkurt, Babak Esmaeili, Dana H. Brooks, Jennifer G. Dy, Jan-Willem van de Meent

Figure 1 for Can VAEs Generate Novel Examples?

Figure 2 for Can VAEs Generate Novel Examples?

Figure 3 for Can VAEs Generate Novel Examples?

Abstract:An implicit goal in works on deep generative models is that such models should be able to generate novel examples that were not previously seen in the training data. In this paper, we investigate to what extent this property holds for widely employed variational autoencoder (VAE) architectures. VAEs maximize a lower bound on the log marginal likelihood, which implies that they will in principle overfit the training data when provided with a sufficiently expressive decoder. In the limit of an infinite capacity decoder, the optimal generative model is a uniform mixture over the training data. More generally, an optimal decoder should output a weighted average over the examples in the training data, where the magnitude of the weights is determined by the proximity in the latent space. This leads to the hypothesis that, for a sufficiently high capacity encoder and decoder, the VAE decoder will perform nearest-neighbor matching according to the coordinates in the latent space. To test this hypothesis, we investigate generalization on the MNIST dataset. We consider both generalization to new examples of previously seen classes, and generalization to the classes that were withheld from the training set. In both cases, we find that reconstructions are closely approximated by nearest neighbors for higher-dimensional parameterizations. When generalizing to unseen classes however, lower-dimensional parameterizations offer a clear advantage.

* Presented at Critiquing and Correcting Trends in Machine Learning Workshop at NeurIPS 2018

Via

Access Paper or Ask Questions

Modeling Theory of Mind for Autonomous Agents with Probabilistic Programs

Dec 04, 2018

Iris Rubi Seaman, Jan-Willem van de Meent, David Wingate

Figure 1 for Modeling Theory of Mind for Autonomous Agents with Probabilistic Programs

Figure 2 for Modeling Theory of Mind for Autonomous Agents with Probabilistic Programs

Figure 3 for Modeling Theory of Mind for Autonomous Agents with Probabilistic Programs

Figure 4 for Modeling Theory of Mind for Autonomous Agents with Probabilistic Programs

Abstract:As autonomous agents become more ubiquitous, they will eventually have to reason about the mental state of other agents, including those agents' beliefs, desires and goals - so-called theory of mind reasoning. We introduce a collection of increasingly complex theory of mind models of a "chaser" pursuing a "runner", known as the Chaser-Runner model. We show that our implementation is a relatively straightforward theory of mind model that can capture a variety of rich behaviors, which in turn, increase runner detection rates relative to basic (non-theory-of-mind) models. In addition, our paper demonstrates that (1) using a planning-as-inference formulation based on nested importance sampling results in agents simultaneously reasoning about other agents' plans and crafting counter-plans, (2) probabilistic programming is a natural way to describe models in which each uses complex primitives such as path planners to make decisions, and (3) allocating additional computation to perform nested reasoning about agents result in lower-variance estimates of expected utility.

Via

Access Paper or Ask Questions

Composing Modeling and Inference Operations with Probabilistic Program Combinators

Nov 29, 2018

Eli Sennesh, Adam Ścibior, Hao Wu, Jan-Willem van de Meent

Figure 1 for Composing Modeling and Inference Operations with Probabilistic Program Combinators

Figure 2 for Composing Modeling and Inference Operations with Probabilistic Program Combinators

Abstract:Probabilistic programs with dynamic computation graphs can define measures over sample spaces with unbounded dimensionality, which constitute programmatic analogues to Bayesian nonparametrics. Owing to the generality of this model class, inference relies on `black-box' Monte Carlo methods that are often not able to take advantage of conditional independence and exchangeability, which have historically been the cornerstones of efficient inference. We here seek to develop a `middle ground' between probabilistic models with fully dynamic and fully static computation graphs. To this end, we introduce a combinator library for the Probabilistic Torch framework. Combinators are functions that accept models and return transformed models. We assume that models are dynamic, but that model composition is static, in the sense that combinator application takes place prior to evaluating the model on data. Combinators provide primitives for both model and inference composition. Model combinators take the form of classic functional programming constructs such as map and reduce. These constructs define a computation graph at a coarsened level of representation, in which nodes correspond to models, rather than individual variables. Inference combinators implement operations such as importance resampling and application of a transition kernel, which alter the evaluation strategy for a model whilst preserving proper weighting. Owing to this property, models defined using combinators can be trained using stochastic methods that optimize either variational or wake-sleep style objectives. As a validation of this principle, we use combinators to implement black box inference for hidden Markov models.

* Published at the NeurIPS workshop "All of Bayesian Nonparametrics (Especially the Useful Bits)" 2018 (https://sites.google.com/view/nipsbnp2018/)

Via

Access Paper or Ask Questions

On Exploration, Exploitation and Learning in Adaptive Importance Sampling

Oct 31, 2018

Xiaoyu Lu, Tom Rainforth, Yuan Zhou, Jan-Willem van de Meent, Yee Whye Teh

Figure 1 for On Exploration, Exploitation and Learning in Adaptive Importance Sampling

Figure 2 for On Exploration, Exploitation and Learning in Adaptive Importance Sampling

Figure 3 for On Exploration, Exploitation and Learning in Adaptive Importance Sampling

Figure 4 for On Exploration, Exploitation and Learning in Adaptive Importance Sampling

Abstract:We study adaptive importance sampling (AIS) as an online learning problem and argue for the importance of the trade-off between exploration and exploitation in this adaptation. Borrowing ideas from the bandits literature, we propose Daisee, a partition-based AIS algorithm. We further introduce a notion of regret for AIS and show that Daisee has $\mathcal{O}(\sqrt{T}(\log T)^{\frac{3}{4}})$ cumulative pseudo-regret, where $T$ is the number of iterations. We then extend Daisee to adaptively learn a hierarchical partitioning of the sample space for more efficient sampling and confirm the performance of both algorithms empirically.

Via

Access Paper or Ask Questions

An Introduction to Probabilistic Programming

Sep 27, 2018

Jan-Willem van de Meent, Brooks Paige, Hongseok Yang, Frank Wood

Figure 1 for An Introduction to Probabilistic Programming

Figure 2 for An Introduction to Probabilistic Programming

Figure 3 for An Introduction to Probabilistic Programming

Figure 4 for An Introduction to Probabilistic Programming

Abstract:This document is designed to be a first-year graduate-level introduction to probabilistic programming. It not only provides a thorough background for anyone wishing to use a probabilistic programming system, but also introduces the techniques needed to design and build these systems. It is aimed at people who have an undergraduate-level understanding of either or, ideally, both probabilistic machine learning and programming languages. We start with a discussion of model-based reasoning and explain why conditioning as a foundational computation is central to the fields of probabilistic machine learning and artificial intelligence. We then introduce a simple first-order probabilistic programming language (PPL) whose programs define static-computation-graph, finite-variable-cardinality models. In the context of this restricted PPL we introduce fundamental inference algorithms and describe how they can be implemented in the context of models denoted by probabilistic programs. In the second part of this document, we introduce a higher-order probabilistic programming language, with a functionality analogous to that of established programming languages. This affords the opportunity to define models with dynamic computation graphs, at the cost of requiring inference methods that generate samples by repeatedly executing the program. Foundational inference algorithms for this kind of probabilistic programming language are explained in the context of an interface between program executions and an inference controller. This document closes with a chapter on advanced topics which we believe to be, at the time of writing, interesting directions for probabilistic programming research; directions that point towards a tight integration with deep neural network research and the development of systems for next-generation artificial intelligence applications.

* Under review at Foundations and Trends in Machine Learning

Via

Access Paper or Ask Questions

Learning Disentangled Representations of Texts with Application to Biomedical Abstracts

Sep 03, 2018

Sarthak Jain, Edward Banner, Jan-Willem van de Meent, Iain J. Marshall, Byron C. Wallace

Figure 1 for Learning Disentangled Representations of Texts with Application to Biomedical Abstracts

Figure 2 for Learning Disentangled Representations of Texts with Application to Biomedical Abstracts

Figure 3 for Learning Disentangled Representations of Texts with Application to Biomedical Abstracts

Figure 4 for Learning Disentangled Representations of Texts with Application to Biomedical Abstracts

Abstract:We propose a method for learning disentangled representations of texts that code for distinct and complementary aspects, with the aim of affording efficient model transfer and interpretability. To induce disentangled embeddings, we propose an adversarial objective based on the (dis)similarity between triplets of documents with respect to specific aspects. Our motivating application is embedding biomedical abstracts describing clinical trials in a manner that disentangles the populations, interventions, and outcomes in a given trial. We show that our method learns representations that encode these clinically salient aspects, and that these can be effectively used to perform aspect-specific retrieval. We demonstrate that the approach generalizes beyond our motivating application in experiments on two multi-aspect review corpora.

* Accepted to EMNLP 2018

Via

Access Paper or Ask Questions