Abstract:Background: Randomized controlled trials (RCTs) are costly, time-consuming, and often infeasible, while treatment-effect estimation from observational data is limited by unobserved confounding. Methods: We developed a three-step framework to address unobserved confounding in observational survival data. First, we infer a latent prognostic factor (U) from restricted mean survival time (RMST) discrepancies between patients with similar observed factors, the same treatment, and divergent outcomes, leveraging the idea that the aggregate effect of unmeasured factors can be inferred even if individual factors cannot. Second, we balance U with observed baseline covariates using prognostic matching, entropy balancing, or inverse probability of treatment weighting. Third, we apply multivariable survival analysis to estimate hazard ratios (HRs). We evaluated the framework in three observational cohorts with RCT benchmarks, two RCT cohorts, and six multicenter observational cohorts. Results: In three observational cohorts (nine comparisons), balancing U improved agreement with trial HRs in all cases; in the strongest settings, it reduced absolute log-HR error by approximately ten-fold versus using observed covariates alone (mean reduction 0.344; p=0.001). In two RCT cohorts, U was balanced across arms (most SMDs <0.1) and adjustment had minimal impact on log-HRs (mean absolute change 0.08). Across six multicenter cohorts, balancing U within centers reduced cross-center dispersion in chemotherapy log-HR estimates (mean reduction 0.147; p=0.016); when populations were directly balanced across centers to account for case-mix differences, cross-center survival differences were narrowed in 75%-100% of comparisons. Conclusions: Inferring and balancing a latent prognostic signal may reduce unobserved confounding and improve treatment-effect estimation from real-world data.




Abstract:We propose a prognostic stratum matching framework that addresses the deficiencies of Randomized trial data subgroup analysis and transforms ObservAtional Data to be used as if they were randomized, thus paving the road for precision medicine. Our approach counters the effects of unobserved confounding in observational data by correcting the estimated probabilities of the outcome under a treatment through a novel two-step process. These probabilities are then used to train Optimal Policy Trees (OPTs), which are decision trees that optimally assign treatments to subgroups of patients based on their characteristics. This facilitates the creation of clinically intuitive treatment recommendations. We applied our framework to observational data of patients with gastrointestinal stromal tumors (GIST) and validated the OPTs in an external cohort using the sensitivity and specificity metrics. We show that these recommendations outperformed those of experts in GIST. We further applied the same framework to randomized clinical trial (RCT) data of patients with extremity sarcomas. Remarkably, despite the initial trial results suggesting that all patients should receive treatment, our framework, after addressing imbalances in patient distribution due to the trial's small sample size, identified through the OPTs a subset of patients with unique characteristics who may not require treatment. Again, we successfully validated our recommendations in an external cohort.