Abstract:Obtaining pixel-level annotations in the medical domain is both expensive and time-consuming, often requiring close collaboration between clinical experts and developers. Semi-supervised medical image segmentation aims to leverage limited annotated data alongside abundant unlabeled data to achieve accurate segmentation. However, existing semi-supervised methods often struggle to structure semantic distributions in the latent space due to noise introduced by pseudo-labels. In this paper, we propose a novel diffusion-based framework for semi-supervised medical image segmentation. Our method introduces a constraint into the latent structure of semantic labels during the denoising diffusion process by enforcing prototype-based contrastive consistency. Rather than explicitly delineating semantic boundaries, the model leverages class prototypes centralized semantic representations in the latent space as anchors. This strategy improves the robustness of dense predictions, particularly in the presence of noisy pseudo-labels. We also introduce a new publicly available benchmark: Multi-Object Segmentation in X-ray Angiography Videos (MOSXAV), which provides detailed, manually annotated segmentation ground truth for multiple anatomical structures in X-ray angiography videos. Extensive experiments on the EndoScapes2023 and MOSXAV datasets demonstrate that our method outperforms state-of-the-art medical image segmentation approaches under the semi-supervised learning setting. This work presents a robust and data-efficient diffusion model that offers enhanced flexibility and strong potential for a wide range of clinical applications.
Abstract:Objective: Interventional devices, catheters and insertable imaging devices such as transesophageal echo (TOE) probes are routinely used in minimally invasive cardiovascular procedures. Detecting their positions and orientations in X-ray fluoroscopic images is important for many clinical applications. Method: In this paper, a novel attention mechanism was designed to guide a convolution neural network (CNN) model to the areas of wires in X-ray images, as nearly all interventional devices and catheters used in cardiovascular procedures contain wires. The attention mechanism includes multi-scale Gaussian derivative filters and a dot-product-based attention layer. By utilizing the proposed attention mechanism, a lightweight foundation model can be created to detect multiple objects simultaneously with higher precision and real-time speed. Results: The proposed model was trained and tested on a total of 12,438 X-ray images. An accuracy of 0.88 was achieved for detecting an echo probe and 0.87 for detecting an artificial valve at 58 FPS. The accuracy was measured by intersection-over-union (IoU). We also achieved a 99.8% success rate in detecting a 10-electrode catheter and a 97.8% success rate in detecting an ablation catheter. Conclusion: Our detection foundation model can simultaneously detect and identify both interventional devices and flexible catheters in real-time X-ray fluoroscopic images. Significance: The proposed model employs a novel attention mechanism to achieve high-performance object detection, making it suitable for various clinical applications and robotic-assisted surgeries. Codes are available at https://github.com/YingLiangMa/AttWire.
Abstract:Automated detection and segmentation of surgical devices, such as catheters or wires, in X-ray fluoroscopic images have the potential to enhance image guidance in minimally invasive heart surgeries. In this paper, we present a convolutional neural network model that integrates a resnet architecture with multiple prediction heads to achieve real-time, accurate localization of electrodes on catheters and catheter segmentation in an end-to-end deep learning framework. We also propose a multi-task learning strategy in which our model is trained to perform both accurate electrode detection and catheter segmentation simultaneously. A key challenge with this approach is achieving optimal performance for both tasks. To address this, we introduce a novel multi-level dynamic resource prioritization method. This method dynamically adjusts sample and task weights during training to effectively prioritize more challenging tasks, where task difficulty is inversely proportional to performance and evolves throughout the training process. Experiments on both public and private datasets have demonstrated that the accuracy of our method surpasses the existing state-of-the-art methods in both single segmentation task and in the detection and segmentation multi-task. Our approach achieves a good trade-off between accuracy and efficiency, making it well-suited for real-time surgical guidance applications.