Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jan-Willem van de Meent

Disentangling Representations of Text by Masking Transformers

Apr 14, 2021

Xiongyi Zhang, Jan-Willem van de Meent, Byron C. Wallace

Figure 1 for Disentangling Representations of Text by Masking Transformers

Figure 2 for Disentangling Representations of Text by Masking Transformers

Figure 3 for Disentangling Representations of Text by Masking Transformers

Figure 4 for Disentangling Representations of Text by Masking Transformers

Abstract:Representations from large pretrained models such as BERT encode a range of features into monolithic vectors, affording strong predictive accuracy across a multitude of downstream tasks. In this paper we explore whether it is possible to learn disentangled representations by identifying existing subnetworks within pretrained models that encode distinct, complementary aspect representations. Concretely, we learn binary masks over transformer weights or hidden units to uncover subsets of features that correlate with a specific factor of variation; this eliminates the need to train a disentangled model from scratch for a particular task. We evaluate this method with respect to its ability to disentangle representations of sentiment from genre in movie reviews, "toxicity" from dialect in Tweets, and syntax from semantics. By combining masking with magnitude pruning we find that we can identify sparse subnetworks within BERT that strongly encode particular aspects (e.g., toxicity) while only weakly encoding others (e.g., race). Moreover, despite only learning masks, we find that disentanglement-via-masking performs as well as -- and often better than -- previously proposed methods based on variational autoencoders and adversarial training.

* 11 pages, 6 figures

Via

Access Paper or Ask Questions

On the Impact of Random Seeds on the Fairness of Clinical Classifiers

Apr 13, 2021

Silvio Amir, Jan-Willem van de Meent, Byron C. Wallace

Figure 1 for On the Impact of Random Seeds on the Fairness of Clinical Classifiers

Figure 2 for On the Impact of Random Seeds on the Fairness of Clinical Classifiers

Figure 3 for On the Impact of Random Seeds on the Fairness of Clinical Classifiers

Figure 4 for On the Impact of Random Seeds on the Fairness of Clinical Classifiers

Abstract:Recent work has shown that fine-tuning large networks is surprisingly sensitive to changes in random seed(s). We explore the implications of this phenomenon for model fairness across demographic groups in clinical prediction tasks over electronic health records (EHR) in MIMIC-III -- the standard dataset in clinical NLP research. Apparent subgroup performance varies substantially for seeds that yield similar overall performance, although there is no evidence of a trade-off between overall and subgroup performance. However, we also find that the small sample sizes inherent to looking at intersections of minority groups and somewhat rare conditions limit our ability to accurately estimate disparities. Further, we find that jointly optimizing for high overall performance and low disparities does not yield statistically significant improvements. Our results suggest that fairness work using MIMIC-III should carefully account for variations in apparent differences that may arise from stochasticity and small sample sizes.

* Accepted for publication at NAACL 2021

Via

Access Paper or Ask Questions

Learning Proposals for Probabilistic Programs with Inference Combinators

Mar 03, 2021

Sam Stites, Heiko Zimmermann, Hao Wu, Eli Sennesh, Jan-Willem van de Meent

Figure 1 for Learning Proposals for Probabilistic Programs with Inference Combinators

Figure 2 for Learning Proposals for Probabilistic Programs with Inference Combinators

Figure 3 for Learning Proposals for Probabilistic Programs with Inference Combinators

Figure 4 for Learning Proposals for Probabilistic Programs with Inference Combinators

Abstract:We develop operators for construction of proposals in probabilistic programs, which we refer to as inference combinators. Inference combinators define a grammar over importance samplers that compose primitive operations such as application of a transition kernel and importance resampling. Proposals in these samplers can be parameterized using neural networks, which in turn can be trained by optimizing variational objectives. The result is a framework for user-programmable variational methods that are correct by construction and can be tailored to specific models. We demonstrate the flexibility of this framework by implementing advanced variational methods based on amortized Gibbs sampling and annealing.

Via

Access Paper or Ask Questions

Generator Surgery for Compressed Sensing

Mar 01, 2021

Niklas Smedemark-Margulies, Jung Yeon Park, Max Daniels, Rose Yu, Jan-Willem van de Meent, Paul Hand

Figure 1 for Generator Surgery for Compressed Sensing

Figure 2 for Generator Surgery for Compressed Sensing

Figure 3 for Generator Surgery for Compressed Sensing

Figure 4 for Generator Surgery for Compressed Sensing

Abstract:Image recovery from compressive measurements requires a signal prior for the images being reconstructed. Recent work has explored the use of deep generative models with low latent dimension as signal priors for such problems. However, their recovery performance is limited by high representation error. We introduce a method for achieving low representation error using generators as signal priors. Using a pre-trained generator, we remove one or more initial blocks at test time and optimize over the new, higher-dimensional latent space to recover a target image. Experiments demonstrate significantly improved reconstruction quality for a variety of network architectures. This approach also works well for out-of-training-distribution images and is competitive with other state-of-the-art methods. Our experiments show that test-time architectural modifications can greatly improve the recovery quality of generator signal priors for compressed sensing.

* Code available at: https://github.com/nik-sm/generator-surgery

Via

Access Paper or Ask Questions

Action Priors for Large Action Spaces in Robotics

Jan 11, 2021

Ondrej Biza, Dian Wang, Robert Platt, Jan-Willem van de Meent, Lawson L. S. Wong

Figure 1 for Action Priors for Large Action Spaces in Robotics

Figure 2 for Action Priors for Large Action Spaces in Robotics

Figure 3 for Action Priors for Large Action Spaces in Robotics

Figure 4 for Action Priors for Large Action Spaces in Robotics

Abstract:In robotics, it is often not possible to learn useful policies using pure model-free reinforcement learning without significant reward shaping or curriculum learning. As a consequence, many researchers rely on expert demonstrations to guide learning. However, acquiring expert demonstrations can be expensive. This paper proposes an alternative approach where the solutions of previously solved tasks are used to produce an action prior that can facilitate exploration in future tasks. The action prior is a probability distribution over actions that summarizes the set of policies found solving previous tasks. Our results indicate that this approach can be used to solve robotic manipulation problems that would otherwise be infeasible without expert demonstrations.

* 12 pages, 9 figures

Via

Access Paper or Ask Questions

Improving Few-Shot Visual Classification with Unlabelled Examples

Jun 17, 2020

Peyman Bateni, Jarred Barber, Jan-Willem van de Meent, Frank Wood

Figure 1 for Improving Few-Shot Visual Classification with Unlabelled Examples

Figure 2 for Improving Few-Shot Visual Classification with Unlabelled Examples

Figure 3 for Improving Few-Shot Visual Classification with Unlabelled Examples

Figure 4 for Improving Few-Shot Visual Classification with Unlabelled Examples

Abstract:We propose a transductive meta-learning method that uses unlabelled instances to improve few-shot image classification performance. Our approach combines a regularized Mahalanobis-distance-based soft k-means clustering procedure with a state of the art neural adaptive feature extractor to achieve improved test-time classification accuracy using unlabelled data. We evaluate our method on transductive few-shot learning tasks, in which the goal is to jointly predict labels for query (test) examples given a set of support (training) examples. We achieve new state of the art in-domain performance on Meta-Dataset, and improve accuracy on mini- and tiered-ImageNet as compared to other conditional neural adaptive methods that use the same pre-trained feature extractor.

Via

Access Paper or Ask Questions

Query-Focused EHR Summarization to Aid Imaging Diagnosis

Apr 26, 2020

Denis Jered McInerney, Borna Dabiri, Anne-Sophie Touret, Geoffrey Young, Jan-Willem van de Meent, Byron C. Wallace

Figure 1 for Query-Focused EHR Summarization to Aid Imaging Diagnosis

Figure 2 for Query-Focused EHR Summarization to Aid Imaging Diagnosis

Figure 3 for Query-Focused EHR Summarization to Aid Imaging Diagnosis

Figure 4 for Query-Focused EHR Summarization to Aid Imaging Diagnosis

Abstract:Electronic Health Records (EHRs) provide vital contextual information to radiologists and other physicians when making a diagnosis. Unfortunately, because a given patient's record may contain hundreds of notes and reports, identifying relevant information within these in the short time typically allotted to a case is very difficult. We propose and evaluate models that extract relevant text snippets from patient records to provide a rough case summary intended to aid physicians considering one or more diagnoses. This is hard because direct supervision (i.e., physician annotations of snippets relevant to specific diagnoses in medical records) is prohibitively expensive to collect at scale. We propose a distantly supervised strategy in which we use groups of International Classification of Diseases (ICD) codes observed in 'future' records as noisy proxies for 'downstream' diagnoses. Using this we train a transformer-based neural model to perform extractive summarization conditioned on potential diagnoses. This model defines an attention mechanism that is conditioned on potential diagnoses (queries) provided by the diagnosing physician. We train (via distant supervision) and evaluate variants of this model on EHR data from Brigham and Women's Hospital in Boston and MIMIC-III (the latter to facilitate reproducibility). Evaluations performed by radiologists demonstrate that these distantly supervised models yield better extractive summaries than do unsupervised approaches. Such models may aid diagnosis by identifying sentences in past patient reports that are clinically relevant to a potential diagnosis.

Via

Access Paper or Ask Questions

Deep Markov Spatio-Temporal Factorization

Mar 22, 2020

Amirreza Farnoosh, Behnaz Rezaei, Eli Zachary Sennesh, Zulqarnain Khan, Jennifer Dy, Ajay Satpute, J Benjamin Hutchinson, Jan-Willem van de Meent, Sarah Ostadabbas

Figure 1 for Deep Markov Spatio-Temporal Factorization

Figure 2 for Deep Markov Spatio-Temporal Factorization

Figure 3 for Deep Markov Spatio-Temporal Factorization

Figure 4 for Deep Markov Spatio-Temporal Factorization

Abstract:We introduce deep Markov spatio-temporal factorization (DMSTF), a deep generative model for spatio-temporal data. Like other factor analysis methods, DMSTF approximates high-dimensional data by a product between time-dependent weights and spatially dependent factors. These weights and factors are in turn represented in terms of lower-dimensional latent variables that we infer using stochastic variational inference. The innovation in DMSTF is that we parameterize weights in terms of a deep Markovian prior, which is able to characterize nonlinear temporal dynamics. We parameterize the corresponding variational distribution using a bidirectional recurrent network. This results in a flexible family of hierarchical deep generative factor analysis models that can be extended to perform time series clustering, or perform factor analysis in the presence of a control signal. Our experiments, which consider simulated data, fMRI data, and traffic data, demonstrate that DMSTF outperforms related methods in terms of reconstruction accuracy and can perform forecasting in a variety domains with nonlinear temporal transitions.

Via

Access Paper or Ask Questions

Learning discrete state abstractions with deep variational inference

Mar 09, 2020

Ondrej Biza, Robert Platt, Jan-Willem van de Meent, Lawson L. S. Wong

Figure 1 for Learning discrete state abstractions with deep variational inference

Figure 2 for Learning discrete state abstractions with deep variational inference

Figure 3 for Learning discrete state abstractions with deep variational inference

Figure 4 for Learning discrete state abstractions with deep variational inference

Abstract:Abstraction is crucial for effective sequential decision making in domains with large state spaces. In this work, we propose a variational information bottleneck method for learning approximate bisimulations, a type of state abstraction. We use a deep neural net encoder to map states onto continuous embeddings. The continuous latent space is then compressed into a discrete representation using an action-conditioned hidden Markov model, which is trained end-to-end with the neural network. Our method is suited for environments with high-dimensional states and learns from a stream of experience collected by an agent acting in a Markov decision process. Through a learned discrete abstract model, we can efficiently plan for unseen goals in a multi-goal Reinforcement Learning setting. We test our method in simplified robotic manipulation domains with image states. We also compare it against previous model-based approaches to finding bisimulations in discrete grid-world-like environments.

* 15 pages, 7 figures

Via

Access Paper or Ask Questions

Evaluating Combinatorial Generalization in Variational Autoencoders

Nov 11, 2019

Alican Bozkurt, Babak Esmaeili, Dana H. Brooks, Jennifer G. Dy, Jan-Willem van de Meent

Figure 1 for Evaluating Combinatorial Generalization in Variational Autoencoders

Figure 2 for Evaluating Combinatorial Generalization in Variational Autoencoders

Figure 3 for Evaluating Combinatorial Generalization in Variational Autoencoders

Figure 4 for Evaluating Combinatorial Generalization in Variational Autoencoders

Abstract:We evaluate the ability of variational autoencoders to generalize to unseen examples in domains with a large combinatorial space of feature values. Our experiments systematically evaluate the effect of network width, depth, regularization, and the typical distance between the training and test examples. Increasing network capacity benefits generalization in easy problems, where test-set examples are similar to training examples. In more difficult problems, increasing capacity deteriorates generalization when optimizing the standard VAE objective, but once again improves generalization when we decrease the KL regularization. Our results establish that interplay between model capacity and KL regularization is not clear cut; we need to take the typical distance between train and test examples into account when evaluating generalization.

Via

Access Paper or Ask Questions