Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ricardo Silva

IV-ICL: Bounding Causal Effects with Instrumental Variables via In-Context Learning

May 13, 2026

Vahid Balazadeh, Hamidreza Kamkari, Medha Barath, Ricardo Silva, Rahul G. Krishnan

Abstract:The instrumental-variables (IV) setting is standard for partial identification of causal effects when unobserved confounding makes point identification impossible. Existing approaches face methodological bottlenecks: closed-form bound estimands are required -- e.g., Balke-Pearl equations in binary IV -- and even when available, designing accurate estimators requires manual effort tailored to each estimand. While direct Bayesian inference of the causal effects, instead of the bounds, circumvents these challenges, it is often computationally intensive and suffers from high prior sensitivity or under-dispersed posteriors. As a remedy, we introduce IV-ICL, an amortized Bayesian in-context learning method that learns the marginal posterior distribution of the causal effects directly and derives bounds as its quantiles. Unlike standard variational inference that optimizes exclusive KL divergence, amortized Bayesian inference minimizes the expected inclusive KL, a mass-covering objective. We empirically observe that optimizing inclusive KL can recover the entire identified set across diverse data-generating processes, while exclusive-KL (e.g. with variational inference) of the same Bayesian formulation collapses onto a single mode and fails to cover the identified set. We evaluate IV-ICL on synthetic and semi-synthetic IV benchmarks and show it produces intervals that are more reliably valid and more informative compared to efficient semi-parametric, Bayesian, and plug-in baselines, at 20-500x lower inference time. Beyond methodology, we propose a procedure to convert randomized controlled trials into IV benchmarks with provably preserved ground-truth causal effects that enables a more realistic evaluation of partial-identification methods.

Via

Access Paper or Ask Questions

Correcting heterogeneous diagnostic bias when developing clinical prediction models using causal hidden Markov models

May 07, 2026

Jose Benitez-Aurioles, Ricardo Silva, Brian McMillan, Matthew Sperrin

Abstract:In routine care, individuals identified a priori as high-risk are usually tested for conditions more frequently. Protected attributes, such as sex or ethnicity may also determine testing frequency. Such heterogeneous detection rates across a population induce label error. This causes systematic model error for specific groups and biases performance metrics during validation. This paper proposes a method to correct for such bias in prediction models due to differential diagnostic delay. We use a causal inference framework to define our target estimand: an individual's diagnosis probability in a counterfactual scenario where their diagnosis rate matches that of a reference group. We model the longitudinal process as a hidden Markov model, in which confirmatory test results are emissions from a latent progressive disease stage. We validate our approach in simulated data and apply it to a case study of chronic kidney disease prediction using electronic health records. In simulations, our method reduces prediction bias and improves calibration-in-the-large, correcting the Observed:Expected ratio in the underdiagnosed group from 1.34 (standard deviation: 0.09) in a model developed without any correction for underdiagnosis bias to 1.02 (0.09). Violations of assumptions in the simulation affected the estimation of model parameters, but the proposed approach nonetheless remained better calibrated than the standard model. In the clinical case study, we identify diabetes as the main driver of observability, with an odds ratio of 10.36 (95% confidence interval, 9.80 - 11.02) in 6-month urine albumin-creatinine ratio testing rate. Using our approach to predict the counterfactual diagnostic rate in patients without diabetes, we improved the Observed:Expected ratio of a developed clinical prediction model from 1.55 (1.51 - 1.59) to 1.01 (0.98 - 1.04).

* 4 figures, 2 tables, 4 supplementaries

Via

Access Paper or Ask Questions

A Causal Framework for Mitigating Data Shifts in Healthcare

Mar 13, 2026

Kurt Butler, Stephanie Riley, Damian Machlanski, Edward Moroshko, Panagiotis Dimitrakopoulos, Thomas Melistas, Akchunya Chanchal, Konstantinos Vilouras, Zhihua Liu, Steven McDonagh(+6 more)

Abstract:Developing predictive models that perform reliably across diverse patient populations and heterogeneous environments is a core aim of medical research. However, generalization is only possible if the learned model is robust to statistical differences between data used for training and data seen at the time and place of deployment. Domain generalization methods provide strategies to address data shifts, but each method comes with its own set of assumptions and trade-offs. To apply these methods in healthcare, we must understand how domain shifts arise, what assumptions we prefer to make, and what our design constraints are. This article proposes a causal framework for the design of predictive models to improve generalization. Causality provides a powerful language to characterize and understand diverse domain shifts, regardless of data modality. This allows us to pinpoint why models fail to generalize, leading to more principled strategies to prepare for and adapt to shifts. We recommend general mitigation strategies, discussing trade-offs and highlighting existing work. Our causality-based perspective offers a critical foundation for developing robust, interpretable, and clinically relevant AI solutions in healthcare, paving the way for reliable real-world deployment.

* 21 pages, 3 figures

Via

Access Paper or Ask Questions

Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models

Nov 29, 2024

Kaican Li, Weiyan Xie, Yongxiang Huang, Didan Deng, Lanqing Hong, Zhenguo Li, Ricardo Silva, Nevin L. Zhang

Figure 1 for Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models

Figure 2 for Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models

Figure 3 for Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models

Figure 4 for Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models

Abstract:Fine-tuning foundation models often compromises their robustness to distribution shifts. To remedy this, most robust fine-tuning methods aim to preserve the pre-trained features. However, not all pre-trained features are robust and those methods are largely indifferent to which ones to preserve. We propose dual risk minimization (DRM), which combines empirical risk minimization with worst-case risk minimization, to better preserve the core features of downstream tasks. In particular, we utilize core-feature descriptions generated by LLMs to induce core-based zero-shot predictions which then serve as proxies to estimate the worst-case risk. DRM balances two crucial aspects of model robustness: expected performance and worst-case performance, establishing a new state of the art on various real-world benchmarks. DRM significantly improves the out-of-distribution performance of CLIP ViT-L/14@336 on ImageNet (75.9 to 77.1), WILDS-iWildCam (47.1 to 51.8), and WILDS-FMoW (50.7 to 53.1); opening up new avenues for robust fine-tuning. Our code is available at https://github.com/vaynexie/DRM .

* NeurIPS 2024

Via

Access Paper or Ask Questions

Structured Learning of Compositional Sequential Interventions

Jun 09, 2024

Jialin Yu, Andreas Koukorinis, Nicolò Colombo, Yuchen Zhu, Ricardo Silva

Figure 1 for Structured Learning of Compositional Sequential Interventions

Figure 2 for Structured Learning of Compositional Sequential Interventions

Figure 3 for Structured Learning of Compositional Sequential Interventions

Figure 4 for Structured Learning of Compositional Sequential Interventions

Abstract:We consider sequential treatment regimes where each unit is exposed to combinations of interventions over time. When interventions are described by qualitative labels, such as ``close schools for a month due to a pandemic'' or ``promote this podcast to this user during this week'', it is unclear which appropriate structural assumptions allow us to generalize behavioral predictions to previously unseen combinatorial sequences. Standard black-box approaches mapping sequences of categorical variables to outputs are applicable, but they rely on poorly understood assumptions on how reliable generalization can be obtained, and may underperform under sparse sequences, temporal variability, and large action spaces. To approach that, we pose an explicit model for \emph{composition}, that is, how the effect of sequential interventions can be isolated into modules, clarifying which data conditions allow for the identification of their combined effect at different units and time steps. We show the identification properties of our compositional model, inspired by advances in causal matrix factorization methods but focusing on predictive models for novel compositions of interventions instead of matrix completion tasks and causal effect estimation. We compare our approach to flexible but generic black-box models to illustrate how structure aids prediction in sparse data conditions.

Via

Access Paper or Ask Questions

Bounding Causal Effects with Leaky Instruments

Apr 05, 2024

David S. Watson, Jordan Penn, Lee M. Gunderson, Gecia Bravo-Hermsdorff, Afsaneh Mastouri, Ricardo Silva

Figure 1 for Bounding Causal Effects with Leaky Instruments

Figure 2 for Bounding Causal Effects with Leaky Instruments

Figure 3 for Bounding Causal Effects with Leaky Instruments

Figure 4 for Bounding Causal Effects with Leaky Instruments

Abstract:Instrumental variables (IVs) are a popular and powerful tool for estimating causal effects in the presence of unobserved confounding. However, classical approaches rely on strong assumptions such as the $\textit{exclusion criterion}$, which states that instrumental effects must be entirely mediated by treatments. This assumption often fails in practice. When IV methods are improperly applied to data that do not meet the exclusion criterion, estimated causal effects may be badly biased. In this work, we propose a novel solution that provides $\textit{partial}$ identification in linear models given a set of $\textit{leaky instruments}$, which are allowed to violate the exclusion criterion to some limited degree. We derive a convex optimization objective that provides provably sharp bounds on the average treatment effect under some common forms of information leakage, and implement inference procedures to quantify the uncertainty of resulting estimates. We demonstrate our method in a set of experiments with simulated data, where it performs favorably against the state of the art.

* 22 pages, 9 figures

Via

Access Paper or Ask Questions

Counterfactual Fairness Is Not Demographic Parity, and Other Observations

Feb 05, 2024

Ricardo Silva

Abstract:Blanket statements of equivalence between causal concepts and purely probabilistic concepts should be approached with care. In this short note, I examine a recent claim that counterfactual fairness is equivalent to demographic parity. The claim fails to hold up upon closer examination. I will take the opportunity to address some broader misunderstandings about counterfactual fairness.

* 17 pages, 2 figures

Via

Access Paper or Ask Questions

Intervention Generalization: A View from Factor Graph Models

Jun 06, 2023

Gecia Bravo-Hermsdorff, David S. Watson, Jialin Yu, Jakob Zeitler, Ricardo Silva

Abstract:One of the goals of causal inference is to generalize from past experiments and observational data to novel conditions. While it is in principle possible to eventually learn a mapping from a novel experimental condition to an outcome of interest, provided a sufficient variety of experiments is available in the training data, coping with a large combinatorial space of possible interventions is hard. Under a typical sparse experimental design, this mapping is ill-posed without relying on heavy regularization or prior distributions. Such assumptions may or may not be reliable, and can be hard to defend or test. In this paper, we take a close look at how to warrant a leap from past experiments to novel conditions based on minimal assumptions about the factorization of the distribution of the manipulated system, communicated in the well-understood language of factor graph models. A postulated $\textit{interventional factor model}$ (IFM) may not always be informative, but it conveniently abstracts away a need for explicit unmeasured confounding and feedback mechanisms, leading to directly testable claims. We derive necessary and sufficient conditions for causal effect identifiability with IFMs using data from a collection of experimental settings, and implement practical algorithms for generalizing expected outcomes to novel conditions never observed in the data.

Via

Access Paper or Ask Questions

Spawrious: A Benchmark for Fine Control of Spurious Correlation Biases

Mar 09, 2023

Aengus Lynch, Gbètondji J-S Dovonon, Jean Kaddour, Ricardo Silva

Figure 1 for Spawrious: A Benchmark for Fine Control of Spurious Correlation Biases

Figure 2 for Spawrious: A Benchmark for Fine Control of Spurious Correlation Biases

Figure 3 for Spawrious: A Benchmark for Fine Control of Spurious Correlation Biases

Figure 4 for Spawrious: A Benchmark for Fine Control of Spurious Correlation Biases

Abstract:The problem of spurious correlations (SCs) arises when a classifier relies on non-predictive features that happen to be correlated with the labels in the training data. For example, a classifier may misclassify dog breeds based on the background of dog images. This happens when the backgrounds are correlated with other breeds in the training data, leading to misclassifications during test time. Previous SC benchmark datasets suffer from varying issues, e.g., over-saturation or only containing one-to-one (O2O) SCs, but no many-to-many (M2M) SCs arising between groups of spurious attributes and classes. In this paper, we present Spawrious-{O2O, M2M}-{Easy, Medium, Hard}, an image classification benchmark suite containing spurious correlations among different dog breeds and background locations. To create this dataset, we employ a text-to-image model to generate photo-realistic images, and an image captioning model to filter out unsuitable ones. The resulting dataset is of high quality, containing approximately 152,000 images. Our experimental results demonstrate that state-of-the-art group robustness methods struggle with Spawrious, most notably on the Hard-splits with $<60\%$ accuracy. By examining model misclassifications, we detect reliances on spurious backgrounds, demonstrating that our dataset provides a significant challenge to drive future research.

Via

Access Paper or Ask Questions

Pragmatic Fairness: Developing Policies with Outcome Disparity Control

Jan 28, 2023

Limor Gultchin, Siyuan Guo, Alan Malek, Silvia Chiappa, Ricardo Silva

Figure 1 for Pragmatic Fairness: Developing Policies with Outcome Disparity Control

Figure 2 for Pragmatic Fairness: Developing Policies with Outcome Disparity Control

Figure 3 for Pragmatic Fairness: Developing Policies with Outcome Disparity Control

Figure 4 for Pragmatic Fairness: Developing Policies with Outcome Disparity Control

Abstract:We introduce a causal framework for designing optimal policies that satisfy fairness constraints. We take a pragmatic approach asking what we can do with an action space available to us and only with access to historical data. We propose two different fairness constraints: a moderation breaking constraint which aims at blocking moderation paths from the action and sensitive attribute to the outcome, and by that at reducing disparity in outcome levels as much as the provided action space permits; and an equal benefit constraint which aims at distributing gain from the new and maximized policy equally across sensitive attribute levels, and thus at keeping pre-existing preferential treatment in place or avoiding the introduction of new disparity. We introduce practical methods for implementing the constraints and illustrate their uses on experiments with semi-synthetic models.

Via

Access Paper or Ask Questions