Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

David Sontag

Evaluating Reinforcement Learning Algorithms in Observational Health Settings

May 31, 2018

Omer Gottesman, Fredrik Johansson, Joshua Meier, Jack Dent, Donghun Lee, Srivatsan Srinivasan, Linying Zhang, Yi Ding, David Wihl, Xuefeng Peng(+9 more)

Figure 1 for Evaluating Reinforcement Learning Algorithms in Observational Health Settings

Figure 2 for Evaluating Reinforcement Learning Algorithms in Observational Health Settings

Figure 3 for Evaluating Reinforcement Learning Algorithms in Observational Health Settings

Figure 4 for Evaluating Reinforcement Learning Algorithms in Observational Health Settings

Abstract:Much attention has been devoted recently to the development of machine learning algorithms with the goal of improving treatment policies in healthcare. Reinforcement learning (RL) is a sub-field within machine learning that is concerned with learning how to make sequences of decisions so as to optimize long-term effects. Already, RL algorithms have been proposed to identify decision-making strategies for mechanical ventilation, sepsis management and treatment of schizophrenia. However, before implementing treatment policies learned by black-box algorithms in high-stakes clinical decision problems, special care must be taken in the evaluation of these policies. In this document, our goal is to expose some of the subtleties associated with evaluating RL algorithms in healthcare. We aim to provide a conceptual starting point for clinical and computational researchers to ask the right questions when designing and evaluating algorithms for new ways of treating patients. In the following, we describe how choices about how to summarize a history, variance of statistical estimators, and confounders in more ad-hoc measures can result in unreliable, even misleading estimates of the quality of a treatment policy. We also provide suggestions for mitigating these effects---for while there is much promise for mining observational health data to uncover better treatment policies, evaluation must be performed thoughtfully.

Via

Access Paper or Ask Questions

Why Is My Classifier Discriminatory?

May 30, 2018

Irene Chen, Fredrik D. Johansson, David Sontag

Figure 1 for Why Is My Classifier Discriminatory?

Figure 2 for Why Is My Classifier Discriminatory?

Figure 3 for Why Is My Classifier Discriminatory?

Abstract:Recent attempts to achieve fairness in predictive models focus on the balance between fairness and accuracy. In sensitive applications such as healthcare or criminal justice, this trade-off is often undesirable as any increase in prediction error could have devastating consequences. In this work, we argue that the fairness of predictions should be evaluated in context of the data, and that unfairness induced by inadequate samples sizes or unmeasured predictive variables should be addressed through data collection, rather than by constraining the model. We decompose cost-based metrics of discrimination into bias, variance, and noise, and propose actions aimed at estimating and reducing each term. Finally, we perform case-studies on prediction of income, mortality, and review ratings, confirming the value of this analysis. We find that data collection is often a means to reduce discrimination without sacrificing accuracy.

* 3 figures, 8 pages, 6 page supplementary

Via

Access Paper or Ask Questions

Optimality of Approximate Inference Algorithms on Stable Instances

Apr 23, 2018

Hunter Lang, David Sontag, Aravindan Vijayaraghavan

Figure 1 for Optimality of Approximate Inference Algorithms on Stable Instances

Figure 2 for Optimality of Approximate Inference Algorithms on Stable Instances

Figure 3 for Optimality of Approximate Inference Algorithms on Stable Instances

Figure 4 for Optimality of Approximate Inference Algorithms on Stable Instances

Abstract:Approximate algorithms for structured prediction problems---such as LP relaxations and the popular alpha-expansion algorithm (Boykov et al. 2001)---typically far exceed their theoretical performance guarantees on real-world instances. These algorithms often find solutions that are very close to optimal. The goal of this paper is to partially explain the performance of alpha-expansion and an LP relaxation algorithm on MAP inference in Ferromagnetic Potts models (FPMs). Our main results give stability conditions under which these two algorithms provably recover the optimal MAP solution. These theoretical results complement numerous empirical observations of good performance.

* 13 pages, 2 figures

Via

Access Paper or Ask Questions

Learning Weighted Representations for Generalization Across Designs

Feb 26, 2018

Fredrik D. Johansson, Nathan Kallus, Uri Shalit, David Sontag

Figure 1 for Learning Weighted Representations for Generalization Across Designs

Figure 2 for Learning Weighted Representations for Generalization Across Designs

Figure 3 for Learning Weighted Representations for Generalization Across Designs

Figure 4 for Learning Weighted Representations for Generalization Across Designs

Abstract:Predictive models that generalize well under distributional shift are often desirable and sometimes crucial to building robust and reliable machine learning applications. We focus on distributional shift that arises in causal inference from observational data and in unsupervised domain adaptation. We pose both of these problems as prediction under a shift in design. Popular methods for overcoming distributional shift make unrealistic assumptions such as having a well-specified model or knowing the policy that gave rise to the observed data. Other methods are hindered by their need for a pre-specified metric for comparing observations, or by poor asymptotic properties. We devise a bound on the generalization error under design shift, incorporating both representation learning and sample re-weighting. Based on the bound, we propose an algorithmic framework that does not require any of the above assumptions and which is asymptotically consistent. We empirically study the new framework using two synthetic datasets, and demonstrate its effectiveness compared to previous methods.

Via

Access Paper or Ask Questions

Causal Effect Inference with Deep Latent-Variable Models

Nov 06, 2017

Christos Louizos, Uri Shalit, Joris Mooij, David Sontag, Richard Zemel, Max Welling

Figure 1 for Causal Effect Inference with Deep Latent-Variable Models

Figure 2 for Causal Effect Inference with Deep Latent-Variable Models

Figure 3 for Causal Effect Inference with Deep Latent-Variable Models

Figure 4 for Causal Effect Inference with Deep Latent-Variable Models

Abstract:Learning individual-level causal effects from observational data, such as inferring the most effective medication for a specific patient, is a problem of growing importance for policy makers. The most important aspect of inferring causal effects from observational data is the handling of confounders, factors that affect both an intervention and its outcome. A carefully designed observational study attempts to measure all important confounders. However, even if one does not have direct access to all confounders, there may exist noisy and uncertain measurement of proxies for confounders. We build on recent advances in latent variable modeling to simultaneously estimate the unknown latent space summarizing the confounders and the causal effect. Our method is based on Variational Autoencoders (VAE) which follow the causal structure of inference with proxies. We show our method is significantly more robust than existing methods, and matches the state-of-the-art on previous benchmarks focused on individual treatment effects.

* Published as a conference paper at NIPS 2017

Via

Access Paper or Ask Questions

Grounded Recurrent Neural Networks

May 23, 2017

Ankit Vani, Yacine Jernite, David Sontag

Figure 1 for Grounded Recurrent Neural Networks

Figure 2 for Grounded Recurrent Neural Networks

Figure 3 for Grounded Recurrent Neural Networks

Figure 4 for Grounded Recurrent Neural Networks

Abstract:In this work, we present the Grounded Recurrent Neural Network (GRNN), a recurrent neural network architecture for multi-label prediction which explicitly ties labels to specific dimensions of the recurrent hidden state (we call this process "grounding"). The approach is particularly well-suited for extracting large numbers of concepts from text. We apply the new model to address an important problem in healthcare of understanding what medical concepts are discussed in clinical text. Using a publicly available dataset derived from Intensive Care Units, we learn to label a patient's diagnoses and procedures from their discharge summary. Our evaluation shows a clear advantage to using our proposed architecture over a variety of strong baselines.

Via

Access Paper or Ask Questions

Estimating individual treatment effect: generalization bounds and algorithms

May 16, 2017

Uri Shalit, Fredrik D. Johansson, David Sontag

Figure 1 for Estimating individual treatment effect: generalization bounds and algorithms

Figure 2 for Estimating individual treatment effect: generalization bounds and algorithms

Figure 3 for Estimating individual treatment effect: generalization bounds and algorithms

Figure 4 for Estimating individual treatment effect: generalization bounds and algorithms

Abstract:There is intense interest in applying machine learning to problems of causal inference in fields such as healthcare, economics and education. In particular, individual-level causal inference has important applications such as precision medicine. We give a new theoretical analysis and family of algorithms for predicting individual treatment effect (ITE) from observational data, under the assumption known as strong ignorability. The algorithms learn a "balanced" representation such that the induced treated and control distributions look similar. We give a novel, simple and intuitive generalization-error bound showing that the expected ITE estimation error of a representation is bounded by a sum of the standard generalization-error of that representation and the distance between the treated and control distributions induced by the representation. We use Integral Probability Metrics to measure distances between distributions, deriving explicit bounds for the Wasserstein and Maximum Mean Discrepancy (MMD) distances. Experiments on real and simulated data show the new algorithms match or outperform the state-of-the-art.

* Added name "TARNet" to refer to version with alpha = 0. Removed supp

Via

Access Paper or Ask Questions

Discourse-Based Objectives for Fast Unsupervised Sentence Representation Learning

Apr 23, 2017

Yacine Jernite, Samuel R. Bowman, David Sontag

Figure 1 for Discourse-Based Objectives for Fast Unsupervised Sentence Representation Learning

Figure 2 for Discourse-Based Objectives for Fast Unsupervised Sentence Representation Learning

Figure 3 for Discourse-Based Objectives for Fast Unsupervised Sentence Representation Learning

Figure 4 for Discourse-Based Objectives for Fast Unsupervised Sentence Representation Learning

Abstract:This work presents a novel objective function for the unsupervised training of neural network sentence encoders. It exploits signals from paragraph-level discourse coherence to train these models to understand text. Our objective is purely discriminative, allowing us to train models many times faster than was possible under prior methods, and it yields models which perform well in extrinsic evaluations.

Via

Access Paper or Ask Questions

Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation

Mar 02, 2017

Yacine Jernite, Anna Choromanska, David Sontag

Figure 1 for Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation

Figure 2 for Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation

Figure 3 for Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation

Figure 4 for Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation

Abstract:We consider multi-class classification where the predictor has a hierarchical structure that allows for a very large number of labels both at train and test time. The predictive power of such models can heavily depend on the structure of the tree, and although past work showed how to learn the tree structure, it expected that the feature vectors remained static. We provide a novel algorithm to simultaneously perform representation learning for the input data and learning of the hierarchi- cal predictor. Our approach optimizes an objec- tive function which favors balanced and easily- separable multi-way node partitions. We theoret- ically analyze this objective, showing that it gives rise to a boosting style property and a bound on classification error. We next show how to extend the algorithm to conditional density estimation. We empirically validate both variants of the al- gorithm on text classification and language mod- eling, respectively, and show that they compare favorably to common baselines in terms of accu- racy and running time.

Via

Access Paper or Ask Questions

Structured Inference Networks for Nonlinear State Space Models

Dec 05, 2016

Rahul G. Krishnan, Uri Shalit, David Sontag

Figure 1 for Structured Inference Networks for Nonlinear State Space Models

Figure 2 for Structured Inference Networks for Nonlinear State Space Models

Figure 3 for Structured Inference Networks for Nonlinear State Space Models

Figure 4 for Structured Inference Networks for Nonlinear State Space Models

Abstract:Gaussian state space models have been used for decades as generative models of sequential data. They admit an intuitive probabilistic interpretation, have a simple functional form, and enjoy widespread adoption. We introduce a unified algorithm to efficiently learn a broad class of linear and non-linear state space models, including variants where the emission and transition distributions are modeled by deep neural networks. Our learning algorithm simultaneously learns a compiled inference network and the generative model, leveraging a structured variational approximation parameterized by recurrent neural networks to mimic the posterior distribution. We apply the learning algorithm to both synthetic and real-world datasets, demonstrating its scalability and versatility. We find that using the structured approximation to the posterior results in models with significantly higher held-out likelihood.

* To appear in the Thirty-First AAAI Conference on Artificial Intelligence, February 2017, 13 pages, 11 figures with supplement, changed to AAAI formatting style, added references

Via

Access Paper or Ask Questions