Abstract:Despite advances in machine learning-based medical image classifiers, the safety and reliability of these systems remain major concerns in practical settings. Existing auditing approaches mainly rely on unimodal features or metadata-based subgroup analyses, which are limited in interpretability and often fail to capture hidden systematic failures. To address these limitations, we introduce the first automated auditing framework that extends slice discovery methods to multimodal representations specifically for medical applications. Comprehensive experiments were conducted under common failure scenarios using the MIMIC-CXR-JPG dataset, demonstrating the framework's strong capability in both failure discovery and explanation generation. Our results also show that multimodal information generally allows more comprehensive and effective auditing of classifiers, while unimodal variants beyond image-only inputs exhibit strong potential in scenarios where resources are constrained.




Abstract:Despite current advances in deep learning, domain shift remains a common problem in medical imaging settings. Recent findings on natural images suggest that deep neural models can show a textural bias when carrying out image classification tasks, which goes against the common understanding of convolutional neural networks (CNNs) recognising objects through increasingly complex representations of shape. This study draws inspiration from recent findings on natural images and aims to investigate ways in which addressing the textural bias phenomenon could be used to bring up the robustness and transferability of deep segmentation models when applied to three-dimensional (3D) medical data. To achieve this, publicly available MRI scans from the Developing Human Connectome Project are used to investigate ways in which simulating textural noise can help train robust models in a complex segmentation task. Our findings illustrate how applying specific types of textural filters prior to training the models can increase their ability to segment scans corrupted by previously unseen noise.