Abstract:Background: Colorectal cancer (CRC) remains a leading cause of cancer-related mortality worldwide. Accurate survival prediction is essential for treatment stratification, yet existing pathology foundation models often overlook organ-specific features critical for CRC prognostication. Methods: We propose MorphDistill, a two-stage framework that distills complementary knowledge from multiple pathology foundation models into a compact CRC-specific encoder. In Stage I, a student encoder is trained using dimension-agnostic multi-teacher relational distillation with supervised contrastive regularization on large-scale colorectal datasets. This preserves inter-sample relationships from ten foundation models without explicit feature alignment. In Stage II, the encoder extracts patch-level features from whole-slide images, which are aggregated via attention-based multiple instance learning to predict five-year survival. Results: On the Alliance/CALGB 89803 cohort (n=424, stage III CRC), MorphDistill achieves an AUC of 0.68 (SD 0.08), an approximately 8% relative improvement over the strongest baseline (AUC 0.63). It also attains a C-index of 0.661 and a hazard ratio of 2.52 (95% CI: 1.73-3.65), outperforming all baselines. On an external TCGA cohort (n=562), it achieves a C-index of 0.628, demonstrating strong generalization across datasets and robustness across clinical subgroups. Conclusion: MorphDistill enables task-specific representation learning by integrating knowledge from multiple foundation models into a unified encoder. This approach provides an efficient strategy for prognostic modeling in computational pathology, with potential for broader oncology applications. Further validation across additional cohorts and disease stages is warranted.




Abstract:Although multiple instance learning (MIL) methods are widely used for automatic tumor detection on whole slide images (WSI), they suffer from the extreme class imbalance within the small tumor WSIs. This occurs when the tumor comprises only a few isolated cells. For early detection, it is of utmost importance that MIL algorithms can identify small tumors, even when they are less than 1% of the size of the WSI. Existing studies have attempted to address this issue using attention-based architectures and instance selection-based methodologies, but have not yielded significant improvements. This paper proposes cross-attention-based salient instance inference MIL (CASiiMIL), which involves a novel saliency-informed attention mechanism, to identify breast cancer lymph node micro-metastasis on WSIs without the need for any annotations. Apart from this new attention mechanism, we introduce a negative representation learning algorithm to facilitate the learning of saliency-informed attention weights for improved sensitivity on tumor WSIs. The proposed model outperforms the state-of-the-art MIL methods on two popular tumor metastasis detection datasets, and demonstrates great cross-center generalizability. In addition, it exhibits excellent accuracy in classifying WSIs with small tumor lesions. Moreover, we show that the proposed model has excellent interpretability attributed to the saliency-informed attention weights. We strongly believe that the proposed method will pave the way for training algorithms for early tumor detection on large datasets where acquiring fine-grained annotations is practically impossible.
Abstract:Multiple instance learning (MIL) models have achieved remarkable success in analyzing whole slide images (WSIs) for disease classification problems. However, with regard to gigapixel WSI classification problems, current MIL models are often incapable of differentiating a WSI with extremely small tumor lesions. This minute tumor-to-normal area ratio in a MIL bag inhibits the attention mechanism from properly weighting the areas corresponding to minor tumor lesions. To overcome this challenge, we propose salient instance inference MIL (SiiMIL), a weakly-supervised MIL model for WSI classification. Our method initially learns representations of normal WSIs, and it then compares the normal WSIs representations with all the input patches to infer the salient instances of the input WSI. Finally, it employs attention-based MIL to perform the slide-level classification based on the selected patches of the WSI. Our experiments imply that SiiMIL can accurately identify tumor instances, which could only take up less than 1% of a WSI, so that the ratio of tumor to normal instances within a bag can increase by two to four times. It is worth mentioning that it performs equally well for large tumor lesions. As a result, SiiMIL achieves a significant improvement in performance over the state-of-the-art MIL methods.