Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Finale Doshi-Velez

Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning

Sep 16, 2021

Sarah Rathnam, Susan A. Murphy, Finale Doshi-Velez

Figure 1 for Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning

Figure 2 for Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning

Figure 3 for Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning

Figure 4 for Comparison and Unification of Three Regularization Methods in Batch Reinforcement Learning

Abstract:In batch reinforcement learning, there can be poorly explored state-action pairs resulting in poorly learned, inaccurate models and poorly performing associated policies. Various regularization methods can mitigate the problem of learning overly-complex models in Markov decision processes (MDPs), however they operate in technically and intuitively distinct ways and lack a common form in which to compare them. This paper unifies three regularization methods in a common framework -- a weighted average transition matrix. Considering regularization methods in this common form illuminates how the MDP structure and the state-action pair distribution of the batch data set influence the relative performance of regularization methods. We confirm intuitions generated from the common framework by empirical evaluation across a range of MDPs and data collection policies.

* ICML Workshop on Reinforcement Learning Theory 2021

Via

Access Paper or Ask Questions

Pre-emptive learning-to-defer for sequential medical decision-making under uncertainty

Sep 13, 2021

Shalmali Joshi, Sonali Parbhoo, Finale Doshi-Velez

Figure 1 for Pre-emptive learning-to-defer for sequential medical decision-making under uncertainty

Figure 2 for Pre-emptive learning-to-defer for sequential medical decision-making under uncertainty

Figure 3 for Pre-emptive learning-to-defer for sequential medical decision-making under uncertainty

Figure 4 for Pre-emptive learning-to-defer for sequential medical decision-making under uncertainty

Abstract:We propose SLTD (`Sequential Learning-to-Defer') a framework for learning-to-defer pre-emptively to an expert in sequential decision-making settings. SLTD measures the likelihood of improving value of deferring now versus later based on the underlying uncertainty in dynamics. In particular, we focus on the non-stationarity in the dynamics to accurately learn the deferral policy. We demonstrate our pre-emptive deferral can identify regions where the current policy has a low probability of improving outcomes. SLTD outperforms existing non-sequential learning-to-defer baselines, whilst reducing overall uncertainty on multiple synthetic and real-world simulators with non-stationary dynamics. We further derive and decompose the propagated (long-term) uncertainty for interpretation by the domain expert to provide an indication of when the model's performance is reliable.

Via

Access Paper or Ask Questions

State Relevance for Off-Policy Evaluation

Sep 13, 2021

Simon P. Shen, Yecheng Jason Ma, Omer Gottesman, Finale Doshi-Velez

Figure 1 for State Relevance for Off-Policy Evaluation

Figure 2 for State Relevance for Off-Policy Evaluation

Figure 3 for State Relevance for Off-Policy Evaluation

Figure 4 for State Relevance for Off-Policy Evaluation

Abstract:Importance sampling-based estimators for off-policy evaluation (OPE) are valued for their simplicity, unbiasedness, and reliance on relatively few assumptions. However, the variance of these estimators is often high, especially when trajectories are of different lengths. In this work, we introduce Omitting-States-Irrelevant-to-Return Importance Sampling (OSIRIS), an estimator which reduces variance by strategically omitting likelihood ratios associated with certain states. We formalize the conditions under which OSIRIS is unbiased and has lower variance than ordinary importance sampling, and we demonstrate these properties empirically.

* Proceedings of the 38th International Conference on Machine Learning, PMLR 139:9537-9546, 2021
* ICML 2021

Via

Access Paper or Ask Questions

Online structural kernel selection for mobile health

Jul 21, 2021

Eura Shin, Pedja Klasnja, Susan Murphy, Finale Doshi-Velez

Figure 1 for Online structural kernel selection for mobile health

Figure 2 for Online structural kernel selection for mobile health

Figure 3 for Online structural kernel selection for mobile health

Figure 4 for Online structural kernel selection for mobile health

Abstract:Motivated by the need for efficient and personalized learning in mobile health, we investigate the problem of online kernel selection for Gaussian Process regression in the multi-task setting. We propose a novel generative process on the kernel composition for this purpose. Our method demonstrates that trajectories of kernel evolutions can be transferred between users to improve learning and that the kernels themselves are meaningful for an mHealth prediction goal.

* Workshop paper in ICML IMLH 2021

Via

Access Paper or Ask Questions

Promises and Pitfalls of Black-Box Concept Learning Models

Jun 24, 2021

Anita Mahinpei, Justin Clark, Isaac Lage, Finale Doshi-Velez, Weiwei Pan

Figure 1 for Promises and Pitfalls of Black-Box Concept Learning Models

Figure 2 for Promises and Pitfalls of Black-Box Concept Learning Models

Figure 3 for Promises and Pitfalls of Black-Box Concept Learning Models

Figure 4 for Promises and Pitfalls of Black-Box Concept Learning Models

Abstract:Machine learning models that incorporate concept learning as an intermediate step in their decision making process can match the performance of black-box predictive models while retaining the ability to explain outcomes in human understandable terms. However, we demonstrate that the concept representations learned by these models encode information beyond the pre-defined concepts, and that natural mitigation strategies do not fully work, rendering the interpretation of the downstream prediction misleading. We describe the mechanism underlying the information leakage and suggest recourse for mitigating its effects.

Via

Access Paper or Ask Questions

Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning

Jun 14, 2021

Kai Wang, Sanket Shah, Haipeng Chen, Andrew Perrault, Finale Doshi-Velez, Milind Tambe

Figure 1 for Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning

Figure 2 for Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning

Figure 3 for Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning

Figure 4 for Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning

Abstract:In the predict-then-optimize framework, the objective is to train a predictive model, mapping from environment features to parameters of an optimization problem, which maximizes decision quality when the optimization is subsequently solved. Recent work on decision-focused learning shows that embedding the optimization problem in the training pipeline can improve decision quality and help generalize better to unseen tasks compared to relying on an intermediate loss function for evaluating prediction quality. We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) that are solved via reinforcement learning. In particular, we are given environment features and a set of trajectories from training MDPs, which we use to train a predictive model that generalizes to unseen test MDPs without trajectories. Two significant computational challenges arise in applying decision-focused learning to MDPs: (i) large state and action spaces make it infeasible for existing techniques to differentiate through MDP problems, and (ii) the high-dimensional policy space, as parameterized by a neural network, makes differentiating through a policy expensive. We resolve the first challenge by sampling provably unbiased derivatives to approximate and differentiate through optimality conditions, and the second challenge by using a low-rank approximation to the high-dimensional sample-based derivatives. We implement both Bellman--based and policy gradient--based decision-focused learning on three different MDP problems with missing parameters, and show that decision-focused learning performs better in generalization to unseen tasks.

Via

Access Paper or Ask Questions

Wide Mean-Field Variational Bayesian Neural Networks Ignore the Data

Jun 13, 2021

Beau Coker, Weiwei Pan, Finale Doshi-Velez

Figure 1 for Wide Mean-Field Variational Bayesian Neural Networks Ignore the Data

Figure 2 for Wide Mean-Field Variational Bayesian Neural Networks Ignore the Data

Figure 3 for Wide Mean-Field Variational Bayesian Neural Networks Ignore the Data

Figure 4 for Wide Mean-Field Variational Bayesian Neural Networks Ignore the Data

Abstract:Variational inference enables approximate posterior inference of the highly over-parameterized neural networks that are popular in modern machine learning. Unfortunately, such posteriors are known to exhibit various pathological behaviors. We prove that as the number of hidden units in a single-layer Bayesian neural network tends to infinity, the function-space posterior mean under mean-field variational inference actually converges to zero, completely ignoring the data. This is in contrast to the true posterior, which converges to a Gaussian process. Our work provides insight into the over-regularization of the KL divergence in variational inference.

Via

Access Paper or Ask Questions

Learning Under Adversarial and Interventional Shifts

Mar 29, 2021

Harvineet Singh, Shalmali Joshi, Finale Doshi-Velez, Himabindu Lakkaraju

Figure 1 for Learning Under Adversarial and Interventional Shifts

Figure 2 for Learning Under Adversarial and Interventional Shifts

Figure 3 for Learning Under Adversarial and Interventional Shifts

Figure 4 for Learning Under Adversarial and Interventional Shifts

Abstract:Machine learning models are often trained on data from one distribution and deployed on others. So it becomes important to design models that are robust to distribution shifts. Most of the existing work focuses on optimizing for either adversarial shifts or interventional shifts. Adversarial methods lack expressivity in representing plausible shifts as they consider shifts to joint distributions in the data. Interventional methods allow more expressivity but provide robustness to unbounded shifts, resulting in overly conservative models. In this work, we combine the complementary strengths of the two approaches and propose a new formulation, RISe, for designing robust models against a set of distribution shifts that are at the intersection of adversarial and interventional shifts. We employ the distributionally robust optimization framework to optimize the resulting objective in both supervised and reinforcement learning settings. Extensive experimentation with synthetic and real world datasets from healthcare demonstrate the efficacy of the proposed approach.

* 19 pages including 5 pages appendix, 6 figures, 2 tables. Preliminary version presented at Causal Discovery & Causality-Inspired Machine Learning Workshop 2020

Via

Access Paper or Ask Questions

Benchmarks, Algorithms, and Metrics for Hierarchical Disentanglement

Feb 09, 2021

Andrew Slavin Ross, Finale Doshi-Velez

Figure 1 for Benchmarks, Algorithms, and Metrics for Hierarchical Disentanglement

Figure 2 for Benchmarks, Algorithms, and Metrics for Hierarchical Disentanglement

Figure 3 for Benchmarks, Algorithms, and Metrics for Hierarchical Disentanglement

Figure 4 for Benchmarks, Algorithms, and Metrics for Hierarchical Disentanglement

Abstract:In representation learning, there has been recent interest in developing algorithms to disentangle the ground-truth generative factors behind data, and metrics to quantify how fully this occurs. However, these algorithms and metrics often assume that both representations and ground-truth factors are flat, continuous, and factorized, whereas many real-world generative processes involve rich hierarchical structure, mixtures of discrete and continuous variables with dependence between them, and even varying intrinsic dimensionality. In this work, we develop benchmarks, algorithms, and metrics for learning such hierarchical representations.

Via

Access Paper or Ask Questions

Evaluating the Interpretability of Generative Models by Interactive Reconstruction

Feb 02, 2021

Andrew Slavin Ross, Nina Chen, Elisa Zhao Hang, Elena L. Glassman, Finale Doshi-Velez

Figure 1 for Evaluating the Interpretability of Generative Models by Interactive Reconstruction

Figure 2 for Evaluating the Interpretability of Generative Models by Interactive Reconstruction

Figure 3 for Evaluating the Interpretability of Generative Models by Interactive Reconstruction

Figure 4 for Evaluating the Interpretability of Generative Models by Interactive Reconstruction

Abstract:For machine learning models to be most useful in numerous sociotechnical systems, many have argued that they must be human-interpretable. However, despite increasing interest in interpretability, there remains no firm consensus on how to measure it. This is especially true in representation learning, where interpretability research has focused on "disentanglement" measures only applicable to synthetic datasets and not grounded in human factors. We introduce a task to quantify the human-interpretability of generative model representations, where users interactively modify representations to reconstruct target instances. On synthetic datasets, we find performance on this task much more reliably differentiates entangled and disentangled models than baseline approaches. On a real dataset, we find it differentiates between representation learning methods widely believed but never shown to produce more or less interpretable models. In both cases, we ran small-scale think-aloud studies and large-scale experiments on Amazon Mechanical Turk to confirm that our qualitative and quantitative results agreed.

* CHI 2021 accepted paper

Via

Access Paper or Ask Questions