Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI

Jul 31, 2025

David Restrepo, Ira Ktena, Maria Vakalopoulou, Stergios Christodoulidis, Enzo Ferrante

Figure 1 for On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI

Figure 2 for On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI

Figure 3 for On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI

Figure 4 for On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI

Share this with someone who'll enjoy it:

Abstract:Clinical decision-making relies on the integrated analysis of medical images and the associated clinical reports. While Vision-Language Models (VLMs) can offer a unified framework for such tasks, they can exhibit strong biases toward one modality, frequently overlooking critical visual cues in favor of textual information. In this work, we introduce Selective Modality Shifting (SMS), a perturbation-based approach to quantify a model's reliance on each modality in binary classification tasks. By systematically swapping images or text between samples with opposing labels, we expose modality-specific biases. We assess six open-source VLMs-four generalist models and two fine-tuned for medical data-on two medical imaging datasets with distinct modalities: MIMIC-CXR (chest X-ray) and FairVLMed (scanning laser ophthalmoscopy). By assessing model performance and the calibration of every model in both unperturbed and perturbed settings, we reveal a marked dependency on text input, which persists despite the presence of complementary visual information. We also perform a qualitative attention-based analysis which further confirms that image content is often overshadowed by text details. Our findings highlight the importance of designing and evaluating multimodal medical models that genuinely integrate visual and textual cues, rather than relying on single-modality signals.

* Accepted to MICCAI 2025 1st Workshop on Multimodal Large Language Models (MLLMs) in Clinical Practice

View paper on

Share this with someone who'll enjoy it:

Title:On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI

Paper and Code