Abstract:Digital pathology archives increasingly contain multiple whole-slide images (WSIs) per case, capturing spatially distinct tumour regions and reflecting intrinsic morphological heterogeneity. However, most existing approaches rely on a single pathologist-selected slide, thereby discarding potentially informative evidence distributed across the remaining WSIs. To date, no autonomous framework has been proposed for comprehensive multi-WSI case processing. Here, we present an unsupervised framework for case-level analysis that integrates information from all available slides within a case. Rather than relying on a single designated slide, the proposed approach constructs case-level representations by selectively distilling informative patches across WSIs. We introduce Clustering-Based Redundancy-Reduced Instance Sampling for Pathology (CRISP), a two-stage framework that first reduces redundancy within individual WSIs and subsequently applies clustering-based sampling to select a compact yet representative set of patches for the entire case. The resulting patch set captures case-level heterogeneity while avoiding exhaustive processing of gigapixel images, and directly serves as a retrieval index. Using two Mayo Clinic breast cancer datasets for diagnosis and treatment planning, we demonstrate that CRISP consistently matches or surpasses the current standard practice of combined model and pathologist slide selection for patient/case search and retrieval. By automating case-level processing and eliminating subjective WSI selection, CRISP potentially enables the exploitation of clinically relevant information distributed across multiple WSIs that is currently overlooked.
Abstract:Digital pathology is revolutionizing the field of pathology by enabling the digitization, storage, and analysis of tissue samples as whole slide images (WSIs). WSIs are gigapixel files that capture the intricate details of tissue samples, providing a rich source of information for diagnostic and research purposes. However, due to their enormous size, representing these images as one compact vector is essential for many computational pathology tasks, such as search and retrieval, to ensure efficiency and scalability. Most current methods are "patch-oriented," meaning they divide WSIs into smaller patches for processing, which prevents a holistic analysis of the entire slide. Additionally, the necessity for compact representation is driven by the expensive high-performance storage required for WSIs. Not all hospitals have access to such extensive storage solutions, leading to potential disparities in healthcare quality and accessibility. This paper provides an overview of existing set-based approaches to single-vector WSI representation, highlighting the innovations that allow for more efficient and effective use of these complex images in digital pathology, thus addressing both computational challenges and storage limitations.