Abstract:Biochemical recurrence (BCR) after radical prostatectomy is a critical endpoint in prostate cancer, yet risk stratification relies almost entirely on variables dominated by Gleason grade. Whether H&E whole slide images (WSIs) carry prognostic signal beyond grade, and whether multiple instance learning (MIL) can recover it, remains unsettled. A key obstacle is that many pipelines select model checkpoints on the evaluation fold, artificially inflating concordance. We construct a rigorous benchmark on TCGA-PRAD (487 patients, 101 BCR events) using strict out-of-fold scoring over five-fold cross-validation repeated across five seeds. The choice of MIL aggregator (ABMIL, CLAM, TransMIL, PatchGCN) has little effect (C-index 0.61-0.64 with UNI2-h), while the feature extractor is the dominant factor (ResNet50 0.566 versus pathology foundation models up to 0.639). A clinical Cox model on grade, stage, and age reaches 0.687; no imaging-only model significantly outperforms it (p > 0.10). We introduce Grade-Disentangled MIL (GD-MIL), a gated-attention MIL encoder trained with a gradient-reversal grade adversary that encourages the slide representation to be invariant to Gleason grade before late fusion with clinical variables. GD-MIL achieves C-index 0.704, significantly outperforming both the clinical baseline (delta-c = +0.029, p = 0.0005) and the best imaging-only model (delta-c = +0.062, p = 0.039), suggesting H&E morphology contains prognostic information complementary to grade. A median risk split yields log-rank p < 0.0001 separation in BCR-free survival (~20% vs ~70% at five years).
Abstract:Predicting microsatellite instability (MSI) status from routine hematoxylin and eosin (H&E) whole slide images (WSIs) offers a practical alternative to molecular testing, but models trained at one institution tend to generalize poorly to slides acquired at a different site. Foundation model representations, despite their generality, still encode site-specific texture alongside the conserved biological morphology underlying MSI. We investigate whether tile-level spatial priors derived from known MSI histology can guide these representations toward more site-invariant features. We introduce a biologically motivated spatial prior based on peripheral distance encoding, reflecting the Crohn's-like peripheral lymphocytic reaction at the tumor invasive margin, and evaluate a secondary local immune neighborhood encoding reflecting the lymphocyte-to-tumor ratio in each tile's immediate spatial neighborhood. Both priors are injected into a TransMIL aggregator before self-attention, allowing the transformer to integrate spatial biological context with UNI2-h or Virchow2 features across all attention layers. We evaluate six foundation model and MIL aggregator combinations as a reference, then assess the effect of each spatial prior. Training on TCGA-COAD (137 slides) and evaluating externally on TCGA-READ (50 slides) without retraining, peripheral distance encoding achieves MSI AUC 0.959 +/- 0.012 on COAD and MSS specificity 1.000 on READ, compared to 0.957 and 0.939 for the strongest reference configuration. Local immune neighborhood encoding achieves comparable internal AUC but lower cross-site specificity, suggesting margin proximity encodes a more site-invariant biological signal than local immune density. Results suggest biologically grounded spatial priors act as regularizers that reduce reliance on site-specific imaging patterns.