School of Optoelectronic Science and Engineering, Soochow University
Abstract:Foundation models pre-trained on large-scale natural image datasets offer a powerful paradigm for medical image segmentation. However, effectively transferring their learned representations for precise clinical applications remains a challenge. In this work, we propose Dino U-Net, a novel encoder-decoder architecture designed to exploit the high-fidelity dense features of the DINOv3 vision foundation model. Our architecture introduces an encoder built upon a frozen DINOv3 backbone, which employs a specialized adapter to fuse the model's rich semantic features with low-level spatial details. To preserve the quality of these representations during dimensionality reduction, we design a new fidelity-aware projection module (FAPM) that effectively refines and projects the features for the decoder. We conducted extensive experiments on seven diverse public medical image segmentation datasets. Our results show that Dino U-Net achieves state-of-the-art performance, consistently outperforming previous methods across various imaging modalities. Our framework proves to be highly scalable, with segmentation accuracy consistently improving as the backbone model size increases up to the 7-billion-parameter variant. The findings demonstrate that leveraging the superior, dense-pretrained features from a general-purpose foundation model provides a highly effective and parameter-efficient approach to advance the accuracy of medical image segmentation. The code is available at https://github.com/yifangao112/DinoUNet.
Abstract:Hyperspectral image denoising faces the challenge of multi-dimensional coupling of spatially non-uniform noise and spectral correlation interference. Existing deep learning methods mostly focus on RGB images and struggle to effectively handle the unique spatial-spectral characteristics and complex noise distributions of hyperspectral images (HSI). This paper proposes an HSI denoising framework, Hybrid-Domain Synergistic Transformer Network (HDST), based on frequency domain enhancement and multiscale modeling, achieving three-dimensional collaborative processing of spatial, frequency and channel domains. The method innovatively integrates three key mechanisms: (1) introducing an FFT preprocessing module with multi-band convolution to extract cross-band correlations and decouple spectral noise components; (2) designing a dynamic cross-domain attention module that adaptively fuses spatial domain texture features and frequency domain noise priors through a learnable gating mechanism; (3) building a hierarchical architecture where shallow layers capture global noise statistics using multiscale atrous convolution, and deep layers achieve detail recovery through frequency domain postprocessing. Experiments on both real and synthetic datasets demonstrate that HDST significantly improves denoising performance while maintaining computational efficiency, validating the effectiveness of the proposed method. This research provides new insights and a universal framework for addressing complex noise coupling issues in HSI and other high-dimensional visual data. The code is available at https://github.com/lhy-cn/HDST-HSIDenoise.
Abstract:Accurate preoperative assessment of lymph node (LN) metastasis in rectal cancer guides treatment decisions, yet conventional MRI evaluation based on morphological criteria shows limited diagnostic performance. While some artificial intelligence models have been developed, they often operate as black boxes, lacking the interpretability needed for clinical trust. Moreover, these models typically evaluate nodes in isolation, overlooking the patient-level context. To address these limitations, we introduce LRMR, an LLM-Driven Relational Multi-node Ranking framework. This approach reframes the diagnostic task from a direct classification problem into a structured reasoning and ranking process. The LRMR framework operates in two stages. First, a multimodal large language model (LLM) analyzes a composite montage image of all LNs from a patient, generating a structured report that details ten distinct radiological features. Second, a text-based LLM performs pairwise comparisons of these reports between different patients, establishing a relative risk ranking based on the severity and number of adverse features. We evaluated our method on a retrospective cohort of 117 rectal cancer patients. LRMR achieved an area under the curve (AUC) of 0.7917 and an F1-score of 0.7200, outperforming a range of deep learning baselines, including ResNet50 (AUC 0.7708). Ablation studies confirmed the value of our two main contributions: removing the relational ranking stage or the structured prompting stage led to a significant performance drop, with AUCs falling to 0.6875 and 0.6458, respectively. Our work demonstrates that decoupling visual perception from cognitive reasoning through a two-stage LLM framework offers a powerful, interpretable, and effective new paradigm for assessing lymph node metastasis in rectal cancer.
Abstract:Accurate lymph node metastasis (LNM) assessment in rectal cancer is essential for treatment planning, yet current MRI-based evaluation shows unsatisfactory accuracy, leading to suboptimal clinical decisions. Developing automated systems also faces significant obstacles, primarily the lack of node-level annotations. Previous methods treat lymph nodes as isolated entities rather than as an interconnected system, overlooking valuable spatial and contextual information. To solve this problem, we present WeGA, a novel weakly-supervised global-local affinity learning framework that addresses these challenges through three key innovations: 1) a dual-branch architecture with DINOv2 backbone for global context and residual encoder for local node details; 2) a global-local affinity extractor that aligns features across scales through cross-attention fusion; and 3) a regional affinity loss that enforces structural coherence between classification maps and anatomical regions. Experiments across one internal and two external test centers demonstrate that WeGA outperforms existing methods, achieving AUCs of 0.750, 0.822, and 0.802 respectively. By effectively modeling the relationships between individual lymph nodes and their collective context, WeGA provides a more accurate and generalizable approach for lymph node metastasis prediction, potentially enhancing diagnostic precision and treatment selection for rectal cancer patients.




Abstract:The game of Go has been highly under-researched due to the lack of game records and analysis tools. In recent years, the increasing number of professional competitions and the advent of AlphaZero-based algorithms provide an excellent opportunity for analyzing human Go games on a large scale. In this paper, we present the ProfessionAl Go annotation datasEt (PAGE), containing 98,525 games played by 2,007 professional players and spans over 70 years. The dataset includes rich AI analysis results for each move. Moreover, PAGE provides detailed metadata for every player and game after manual cleaning and labeling. Beyond the preliminary analysis of the dataset, we provide sample tasks that benefit from our dataset to demonstrate the potential application of PAGE in multiple research directions. To the best of our knowledge, PAGE is the first dataset with extensive annotation in the game of Go. This work is an extended version of [1] where we perform a more detailed description, analysis, and application.