Abstract:Stains are essential in histopathology to visualize specific tissue characteristics, with Haematoxylin and Eosin (H&E) serving as the clinical standard. However, pathologists frequently utilize a variety of special stains for the diagnosis of specific morphologies. Maintaining accurate metadata for these slides is critical for quality control in clinical archives and for the integrity of computational pathology datasets. In this work, we compare two approaches for automated classification of stains using whole slide images, covering the 14 most commonly used special stains in our institute alongside standard and frozen-section H&E. We evaluate a Multi-Instance Learning (MIL) pipeline and a proposed lightweight thumbnail-based approach. On internal test data, MIL achieved the highest performance (macro F1: 0.941 for 16 classes; 0.969 for 14 merged classes), while the thumbnail approach remained competitive (0.897 and 0.953, respectively). On external TCGA data, the thumbnail model generalized best (weighted F1: 0.843 vs. 0.807 for MIL). The thumbnail approach also increased throughput by two orders of magnitude (5.635 vs. 0.018 slides/s for MIL with all patches). We conclude that thumbnail-based classification provides a scalable and robust solution for routine visual quality control in digital pathology workflows.
Abstract:Accurate annotation of fixation type is a critical step in slide preparation for pathology laboratories. However, this manual process is prone to errors, impacting downstream analyses and diagnostic accuracy. Existing methods for verifying formalin-fixed, paraffin-embedded (FFPE), and frozen section (FS) fixation types typically require full-resolution whole-slide images (WSIs), limiting scalability for high-throughput quality control. We propose a deep-learning model to predict fixation types using low-resolution, pre-scan thumbnail images. The model was trained on WSIs from the TUM Institute of Pathology (n=1,200, Leica GT450DX) and evaluated on a class-balanced subset of The Cancer Genome Atlas dataset (TCGA, n=8,800, Leica AT2), as well as on class-balanced datasets from Augsburg (n=695 [392 FFPE, 303 FS], Philips UFS) and Regensburg (n=202, 3DHISTECH P1000). Our model achieves an AUROC of 0.88 on TCGA, outperforming comparable pre-scan methods by 4.8%. It also achieves AUROCs of 0.72 on Regensburg and Augsburg slides, underscoring challenges related to scanner-induced domain shifts. Furthermore, the model processes each slide in 21 ms, $400\times$ faster than existing high-magnification, full-resolution methods, enabling rapid, high-throughput processing. This approach provides an efficient solution for detecting labelling errors without relying on high-magnification scans, offering a valuable tool for quality control in high-throughput pathology workflows. Future work will improve and evaluate the model's generalisation to additional scanner types. Our findings suggest that this method can increase accuracy and efficiency in digital pathology workflows and may be extended to other low-resolution slide annotations.