Abstract:Causal inference, estimating causal effects from observational data, is a fundamental tool in many disciplines. Of particular importance across a variety of domains is the continuous treatment setting, where the variable of intervention has a continuous range. This setting is far less explored and represents a substantial shift from the binary treatment setting, with models needing to represent effects across a continuum of treatment values. In this paper, we present the first causal foundation model for the continuous treatment setting. Our model meta-learns the ability to predict causal effects across a wide variety of unseen tasks without additional training or fine-tuning. First, we design a novel prior over data-generating processes with continuous treatment variables in order to generate a rich causal training corpus. We then train a transformer to reconstruct individual treatment-response curves given only observational data, leveraging in-context learning to amortize expensive Bayesian posterior inference. Our model achieves state-of-the-art performance on individual treatment-response curve reconstruction tasks compared to causal models which are trained specifically for those tasks.
Abstract:The instrumental-variables (IV) setting is standard for partial identification of causal effects when unobserved confounding makes point identification impossible. Existing approaches face methodological bottlenecks: closed-form bound estimands are required -- e.g., Balke-Pearl equations in binary IV -- and even when available, designing accurate estimators requires manual effort tailored to each estimand. While direct Bayesian inference of the causal effects, instead of the bounds, circumvents these challenges, it is often computationally intensive and suffers from high prior sensitivity or under-dispersed posteriors. As a remedy, we introduce IV-ICL, an amortized Bayesian in-context learning method that learns the marginal posterior distribution of the causal effects directly and derives bounds as its quantiles. Unlike standard variational inference that optimizes exclusive KL divergence, amortized Bayesian inference minimizes the expected inclusive KL, a mass-covering objective. We empirically observe that optimizing inclusive KL can recover the entire identified set across diverse data-generating processes, while exclusive-KL (e.g. with variational inference) of the same Bayesian formulation collapses onto a single mode and fails to cover the identified set. We evaluate IV-ICL on synthetic and semi-synthetic IV benchmarks and show it produces intervals that are more reliably valid and more informative compared to efficient semi-parametric, Bayesian, and plug-in baselines, at 20-500x lower inference time. Beyond methodology, we propose a procedure to convert randomized controlled trials into IV benchmarks with provably preserved ground-truth causal effects that enables a more realistic evaluation of partial-identification methods.