Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Na Feng

MCA-RG: Enhancing LLMs with Medical Concept Alignment for Radiology Report Generation

Jul 09, 2025

Qilong Xing, Zikai Song, Youjia Zhang, Na Feng, Junqing Yu, Wei Yang

Figure 1 for MCA-RG: Enhancing LLMs with Medical Concept Alignment for Radiology Report Generation

Figure 2 for MCA-RG: Enhancing LLMs with Medical Concept Alignment for Radiology Report Generation

Figure 3 for MCA-RG: Enhancing LLMs with Medical Concept Alignment for Radiology Report Generation

Figure 4 for MCA-RG: Enhancing LLMs with Medical Concept Alignment for Radiology Report Generation

Abstract:Despite significant advancements in adapting Large Language Models (LLMs) for radiology report generation (RRG), clinical adoption remains challenging due to difficulties in accurately mapping pathological and anatomical features to their corresponding text descriptions. Additionally, semantic agnostic feature extraction further hampers the generation of accurate diagnostic reports. To address these challenges, we introduce Medical Concept Aligned Radiology Report Generation (MCA-RG), a knowledge-driven framework that explicitly aligns visual features with distinct medical concepts to enhance the report generation process. MCA-RG utilizes two curated concept banks: a pathology bank containing lesion-related knowledge, and an anatomy bank with anatomical descriptions. The visual features are aligned with these medical concepts and undergo tailored enhancement. We further propose an anatomy-based contrastive learning procedure to improve the generalization of anatomical features, coupled with a matching loss for pathological features to prioritize clinically relevant regions. Additionally, a feature gating mechanism is employed to filter out low-quality concept features. Finally, the visual features are corresponding to individual medical concepts, and are leveraged to guide the report generation process. Experiments on two public benchmarks (MIMIC-CXR and CheXpert Plus) demonstrate that MCA-RG achieves superior performance, highlighting its effectiveness in radiology report generation.

* MICCAI 2025

Via

Access Paper or Ask Questions

SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding

Apr 10, 2025

Yangliu Hu, Zikai Song, Na Feng, Yawei Luo, Junqing Yu, Yi-Ping Phoebe Chen, Wei Yang

Figure 1 for SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding

Figure 2 for SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding

Figure 3 for SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding

Figure 4 for SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding

Abstract:Video-based Large Language Models (Video-LLMs) have witnessed substantial advancements in recent years, propelled by the advancement in multi-modal LLMs. Although these models have demonstrated proficiency in providing the overall description of videos, they struggle with fine-grained understanding, particularly in aspects such as visual dynamics and video details inquiries. To tackle these shortcomings, we find that fine-tuning Video-LLMs on self-supervised fragment tasks, greatly improve their fine-grained video understanding abilities. Hence we propose two key contributions:(1) Self-Supervised Fragment Fine-Tuning (SF$^2$T), a novel effortless fine-tuning method, employs the rich inherent characteristics of videos for training, while unlocking more fine-grained understanding ability of Video-LLMs. Moreover, it relieves researchers from labor-intensive annotations and smartly circumvents the limitations of natural language, which often fails to capture the complex spatiotemporal variations in videos; (2) A novel benchmark dataset, namely FineVidBench, for rigorously assessing Video-LLMs' performance at both the scene and fragment levels, offering a comprehensive evaluation of their capabilities. We assessed multiple models and validated the effectiveness of SF$^2$T on them. Experimental results reveal that our approach improves their ability to capture and interpret spatiotemporal details.

* Accepted to CVPR2025

Via

Access Paper or Ask Questions