Abstract:Endoscopic surgery relies on intraoperative video, making image quality a decisive factor for surgical safety and efficacy. Yet, endoscopic videos are often degraded by uneven illumination, tissue scattering, occlusions, and motion blur, which obscure critical anatomical details and complicate surgical manipulation. Although deep learning-based methods have shown promise in image enhancement, most existing approaches remain too computationally demanding for real-time surgical use. To address this challenge, we propose a degradation-aware framework for endoscopic video enhancement, which enables real-time, high-quality enhancement by propagating degradation representations across frames. In our framework, degradation representations are first extracted from images using contrastive learning. We then introduce a fusion mechanism that modulates image features with these representations to guide a single-frame enhancement model, which is trained with a cycle-consistency constraint between degraded and restored images to improve robustness and generalization. Experiments demonstrate that our framework achieves a superior balance between performance and efficiency compared with several state-of-the-art methods. These results highlight the effectiveness of degradation-aware modeling for real-time endoscopic video enhancement. Nevertheless, our method suggests that implicitly learning and propagating degradation representation offer a practical pathway for clinical application.




Abstract:Rotation-invariant recognition of shapes is a common challenge in computer vision. Recent approaches have significantly improved the accuracy of rotation-invariant recognition by encoding the rotational invariance of shapes as hand-crafted image features and introducing deep neural networks. However, the methods based on pixels have too much redundant information, and the critical geometric information is prone to early leakage, resulting in weak rotation-invariant recognition of fine-grained shapes. In this paper, we reconsider the shape recognition problem from the perspective of contour points rather than pixels. We propose an anti-noise rotation-invariant convolution module based on contour geometric aware for fine-grained shape recognition. The module divides the shape contour into multiple local geometric regions(LGA), where we implement finer-grained rotation-invariant coding in terms of point topological relations. We provide a deep network composed of five such cascaded modules for classification and retrieval experiments. The results show that our method exhibits excellent performance in rotation-invariant recognition of fine-grained shapes. In addition, we demonstrate that our method is robust to contour noise and the rotation centers. The source code is available at https://github.com/zhenguonie/ANRICN_CGA.
Abstract:In the realm of 3D reconstruction from 2D images, a persisting challenge is to achieve high-precision reconstructions devoid of 3D Ground Truth data reliance. We present UNeR3D, a pioneering unsupervised methodology that sets a new standard for generating detailed 3D reconstructions solely from 2D views. Our model significantly cuts down the training costs tied to supervised approaches and introduces RGB coloration to 3D point clouds, enriching the visual experience. Employing an inverse distance weighting technique for color rendering, UNeR3D ensures seamless color transitions, enhancing visual fidelity. Our model's flexible architecture supports training with any number of views, and uniquely, it is not constrained by the number of views used during training when performing reconstructions. It can infer with an arbitrary count of views during inference, offering unparalleled versatility. Additionally, the model's continuous spatial input domain allows the generation of point clouds at any desired resolution, empowering the creation of high-resolution 3D RGB point clouds. We solidify the reconstruction process with a novel multi-view geometric loss and color loss, demonstrating that our model excels with single-view inputs and beyond, thus reshaping the paradigm of unsupervised learning in 3D vision. Our contributions signal a substantial leap forward in 3D vision, offering new horizons for content creation across diverse applications. Code is available at https://github.com/HongbinLin3589/UNeR3D.