Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yili He

Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records

May 14, 2025

Yili He, Yan Zhu, Peiyao Fu, Ruijie Yang, Tianyi Chen, Zhihua Wang, Quanlin Li, Pinghong Zhou, Xian Yang, Shuo Wang

Figure 1 for Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records

Figure 2 for Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records

Figure 3 for Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records

Figure 4 for Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records

Abstract:Pre-training on image-text colonoscopy records offers substantial potential for improving endoscopic image analysis, but faces challenges including non-informative background images, complex medical terminology, and ambiguous multi-lesion descriptions. We introduce Endo-CLIP, a novel self-supervised framework that enhances Contrastive Language-Image Pre-training (CLIP) for this domain. Endo-CLIP's three-stage framework--cleansing, attunement, and unification--addresses these challenges by (1) removing background frames, (2) leveraging large language models to extract clinical attributes for fine-grained contrastive learning, and (3) employing patient-level cross-attention to resolve multi-polyp ambiguities. Extensive experiments demonstrate that Endo-CLIP significantly outperforms state-of-the-art pre-training methods in zero-shot and few-shot polyp detection and classification, paving the way for more accurate and clinically relevant endoscopic analysis.

* Early accepted to MICCAI 2025

Via

Access Paper or Ask Questions