Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Finale Doshi-Velez

School of Engineering and Applied Science, Harvard University, Cambridge, Massachusetts, United States

Monitoring Fidelity of Online Reinforcement Learning Algorithms in Clinical Trials

Feb 26, 2024

Anna L. Trella, Kelly W. Zhang, Inbal Nahum-Shani, Vivek Shetty, Iris Yan, Finale Doshi-Velez, Susan A. Murphy

Figure 1 for Monitoring Fidelity of Online Reinforcement Learning Algorithms in Clinical Trials

Abstract:Online reinforcement learning (RL) algorithms offer great potential for personalizing treatment for participants in clinical trials. However, deploying an online, autonomous algorithm in the high-stakes healthcare setting makes quality control and data quality especially difficult to achieve. This paper proposes algorithm fidelity as a critical requirement for deploying online RL algorithms in clinical trials. It emphasizes the responsibility of the algorithm to (1) safeguard participants and (2) preserve the scientific utility of the data for post-trial analyses. We also present a framework for pre-deployment planning and real-time monitoring to help algorithm developers and clinical researchers ensure algorithm fidelity. To illustrate our framework's practical application, we present real-world examples from the Oralytics clinical trial. Since Spring 2023, this trial successfully deployed an autonomous, online RL algorithm to personalize behavioral interventions for participants at risk for dental disease.

Via

Access Paper or Ask Questions

Guarantee Regions for Local Explanations

Feb 20, 2024

Marton Havasi, Sonali Parbhoo, Finale Doshi-Velez

Figure 1 for Guarantee Regions for Local Explanations

Figure 2 for Guarantee Regions for Local Explanations

Figure 3 for Guarantee Regions for Local Explanations

Figure 4 for Guarantee Regions for Local Explanations

Abstract:Interpretability methods that utilise local surrogate models (e.g. LIME) are very good at describing the behaviour of the predictive model at a point of interest, but they are not guaranteed to extrapolate to the local region surrounding the point. However, overfitting to the local curvature of the predictive model and malicious tampering can significantly limit extrapolation. We propose an anchor-based algorithm for identifying regions in which local explanations are guaranteed to be correct by explicitly describing those intervals along which the input features can be trusted. Our method produces an interpretable feature-aligned box where the prediction of the local surrogate model is guaranteed to match the predictive model. We demonstrate that our algorithm can be used to find explanations with larger guarantee regions that better cover the data manifold compared to existing baselines. We also show how our method can identify misleading local explanations with significantly poorer guarantee regions.

Via

Access Paper or Ask Questions

Non-Stationary Latent Auto-Regressive Bandits

Feb 05, 2024

Anna L. Trella, Walter Dempsey, Finale Doshi-Velez, Susan A. Murphy

Figure 1 for Non-Stationary Latent Auto-Regressive Bandits

Figure 2 for Non-Stationary Latent Auto-Regressive Bandits

Figure 3 for Non-Stationary Latent Auto-Regressive Bandits

Figure 4 for Non-Stationary Latent Auto-Regressive Bandits

Abstract:We consider the stochastic multi-armed bandit problem with non-stationary rewards. We present a novel formulation of non-stationarity in the environment where changes in the mean reward of the arms over time are due to some unknown, latent, auto-regressive (AR) state of order $k$. We call this new environment the latent AR bandit. Different forms of the latent AR bandit appear in many real-world settings, especially in emerging scientific fields such as behavioral health or education where there are few mechanistic models of the environment. If the AR order $k$ is known, we propose an algorithm that achieves $\tilde{O}(k\sqrt{T})$ regret in this setting. Empirically, our algorithm outperforms standard UCB across multiple non-stationary environments, even if $k$ is mis-specified.

Via

Access Paper or Ask Questions

Semi-parametric Expert Bayesian Network Learning with Gaussian Processes and Horseshoe Priors

Jan 29, 2024

Yidou Weng, Finale Doshi-Velez

Abstract:This paper proposes a model learning Semi-parametric relationships in an Expert Bayesian Network (SEBN) with linear parameter and structure constraints. We use Gaussian Processes and a Horseshoe prior to introduce minimal nonlinear components. To prioritize modifying the expert graph over adding new edges, we optimize differential Horseshoe scales. In real-world datasets with unknown truth, we generate diverse graphs to accommodate user input, addressing identifiability issues and enhancing interpretability. Evaluation on synthetic and UCI Liver Disorders datasets, using metrics like structural Hamming Distance and test likelihood, demonstrates our models outperform state-of-the-art semi-parametric Bayesian Network model.

* 8 pages, 4 figures, AAAI-2024 workshops

Via

Access Paper or Ask Questions

Reinforcement Learning Interventions on Boundedly Rational Human Agents in Frictionful Tasks

Jan 26, 2024

Eura Nofshin, Siddharth Swaroop, Weiwei Pan, Susan Murphy, Finale Doshi-Velez

Abstract:Many important behavior changes are frictionful; they require individuals to expend effort over a long period with little immediate gratification. Here, an artificial intelligence (AI) agent can provide personalized interventions to help individuals stick to their goals. In these settings, the AI agent must personalize rapidly (before the individual disengages) and interpretably, to help us understand the behavioral interventions. In this paper, we introduce Behavior Model Reinforcement Learning (BMRL), a framework in which an AI agent intervenes on the parameters of a Markov Decision Process (MDP) belonging to a boundedly rational human agent. Our formulation of the human decision-maker as a planning agent allows us to attribute undesirable human policies (ones that do not lead to the goal) to their maladapted MDP parameters, such as an extremely low discount factor. Furthermore, we propose a class of tractable human models that captures fundamental behaviors in frictionful tasks. Introducing a notion of MDP equivalence specific to BMRL, we theoretically and empirically show that AI planning with our human models can lead to helpful policies on a wide range of more complex, ground-truth humans.

* In AAMAS 2024

Via

Access Paper or Ask Questions

Toward Computationally Efficient Inverse Reinforcement Learning via Reward Shaping

Dec 18, 2023

Lauren H. Cooke, Harvey Klyne, Edwin Zhang, Cassidy Laidlaw, Milind Tambe, Finale Doshi-Velez

Abstract:Inverse reinforcement learning (IRL) is computationally challenging, with common approaches requiring the solution of multiple reinforcement learning (RL) sub-problems. This work motivates the use of potential-based reward shaping to reduce the computational burden of each RL sub-problem. This work serves as a proof-of-concept and we hope will inspire future developments towards computationally efficient IRL.

Via

Access Paper or Ask Questions

Signature Activation: A Sparse Signal View for Holistic Saliency

Sep 20, 2023

Jose Roberto Tello Ayala, Akl C. Fahed, Weiwei Pan, Eugene V. Pomerantsev, Patrick T. Ellinor, Anthony Philippakis, Finale Doshi-Velez

Abstract:The adoption of machine learning in healthcare calls for model transparency and explainability. In this work, we introduce Signature Activation, a saliency method that generates holistic and class-agnostic explanations for Convolutional Neural Network (CNN) outputs. Our method exploits the fact that certain kinds of medical images, such as angiograms, have clear foreground and background objects. We give theoretical explanation to justify our methods. We show the potential use of our method in clinical settings through evaluating its efficacy for aiding the detection of lesions in coronary angiograms.

Via

Access Paper or Ask Questions

Why do universal adversarial attacks work on large language models?: Geometry might be the answer

Sep 01, 2023

Varshini Subhash, Anna Bialas, Weiwei Pan, Finale Doshi-Velez

Figure 1 for Why do universal adversarial attacks work on large language models?: Geometry might be the answer

Figure 2 for Why do universal adversarial attacks work on large language models?: Geometry might be the answer

Figure 3 for Why do universal adversarial attacks work on large language models?: Geometry might be the answer

Figure 4 for Why do universal adversarial attacks work on large language models?: Geometry might be the answer

Abstract:Transformer based large language models with emergent capabilities are becoming increasingly ubiquitous in society. However, the task of understanding and interpreting their internal workings, in the context of adversarial attacks, remains largely unsolved. Gradient-based universal adversarial attacks have been shown to be highly effective on large language models and potentially dangerous due to their input-agnostic nature. This work presents a novel geometric perspective explaining universal adversarial attacks on large language models. By attacking the 117M parameter GPT-2 model, we find evidence indicating that universal adversarial triggers could be embedding vectors which merely approximate the semantic information in their adversarial training region. This hypothesis is supported by white-box model analysis comprising dimensionality reduction and similarity measurement of hidden representations. We believe this new geometric perspective on the underlying mechanism driving universal attacks could help us gain deeper insight into the internal workings and failure modes of LLMs, thus enabling their mitigation.

* 2nd AdvML Frontiers Workshop at 40th International Conference on Machine Learning, Honolulu, Hawaii, USA, 2023

Via

Access Paper or Ask Questions

Bayesian Inverse Transition Learning for Offline Settings

Aug 09, 2023

Leo Benac, Sonali Parbhoo, Finale Doshi-Velez

Figure 1 for Bayesian Inverse Transition Learning for Offline Settings

Figure 2 for Bayesian Inverse Transition Learning for Offline Settings

Abstract:Offline Reinforcement learning is commonly used for sequential decision-making in domains such as healthcare and education, where the rewards are known and the transition dynamics $T$ must be estimated on the basis of batch data. A key challenge for all tasks is how to learn a reliable estimate of the transition dynamics $T$ that produce near-optimal policies that are safe enough so that they never take actions that are far away from the best action with respect to their value functions and informative enough so that they communicate the uncertainties they have. Using data from an expert, we propose a new constraint-based approach that captures our desiderata for reliably learning a posterior distribution of the transition dynamics $T$ that is free from gradients. Our results demonstrate that by using our constraints, we learn a high-performing policy, while considerably reducing the policy's variance over different datasets. We also explain how combining uncertainty estimation with these constraints can help us infer a partial ranking of actions that produce higher returns, and helps us infer safer and more informative policies for planning.

* 8 pages, 1 plots, 2 tables

Via

Access Paper or Ask Questions

SAP-sLDA: An Interpretable Interface for Exploring Unstructured Text

Jul 28, 2023

Charumathi Badrinath, Weiwei Pan, Finale Doshi-Velez

Figure 1 for SAP-sLDA: An Interpretable Interface for Exploring Unstructured Text

Figure 2 for SAP-sLDA: An Interpretable Interface for Exploring Unstructured Text

Figure 3 for SAP-sLDA: An Interpretable Interface for Exploring Unstructured Text

Figure 4 for SAP-sLDA: An Interpretable Interface for Exploring Unstructured Text

Abstract:A common way to explore text corpora is through low-dimensional projections of the documents, where one hopes that thematically similar documents will be clustered together in the projected space. However, popular algorithms for dimensionality reduction of text corpora, like Latent Dirichlet Allocation (LDA), often produce projections that do not capture human notions of document similarity. We propose a semi-supervised human-in-the-loop LDA-based method for learning topics that preserve semantically meaningful relationships between documents in low-dimensional projections. On synthetic corpora, our method yields more interpretable projections than baseline methods with only a fraction of labels provided. On a real corpus, we obtain qualitatively similar results.

Via

Access Paper or Ask Questions