Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiechao Gao

S2D-ALIGN: Shallow-to-Deep Auxiliary Learning for Anatomically-Grounded Radiology Report Generation

Nov 14, 2025

Jiechao Gao, Chang Liu, Yuangang Li

Figure 1 for S2D-ALIGN: Shallow-to-Deep Auxiliary Learning for Anatomically-Grounded Radiology Report Generation

Figure 2 for S2D-ALIGN: Shallow-to-Deep Auxiliary Learning for Anatomically-Grounded Radiology Report Generation

Figure 3 for S2D-ALIGN: Shallow-to-Deep Auxiliary Learning for Anatomically-Grounded Radiology Report Generation

Figure 4 for S2D-ALIGN: Shallow-to-Deep Auxiliary Learning for Anatomically-Grounded Radiology Report Generation

Abstract:Radiology Report Generation (RRG) aims to automatically generate diagnostic reports from radiology images. To achieve this, existing methods have leveraged the powerful cross-modal generation capabilities of Multimodal Large Language Models (MLLMs), primarily focusing on optimizing cross-modal alignment between radiographs and reports through Supervised Fine-Tuning (SFT). However, by only performing instance-level alignment with the image-text pairs, the standard SFT paradigm fails to establish anatomically-grounded alignment, where the templated nature of reports often leads to sub-optimal generation quality. To address this, we propose \textsc{S2D-Align}, a novel SFT paradigm that establishes anatomically-grounded alignment by leveraging auxiliary signals of varying granularities. \textsc{S2D-Align} implements a shallow-to-deep strategy, progressively enriching the alignment process: it begins with the coarse radiograph-report pairing, then introduces reference reports for instance-level guidance, and ultimately utilizes key phrases to ground the generation in specific anatomical details. To bridge the different alignment stages, we introduce a memory-based adapter that empowers feature sharing, thereby integrating coarse and fine-grained guidance. For evaluation, we conduct experiments on the public \textsc{MIMIC-CXR} and \textsc{IU X-Ray} benchmarks, where \textsc{S2D-Align} achieves state-of-the-art performance compared to existing methods. Ablation studies validate the effectiveness of our multi-stage, auxiliary-guided approach, highlighting a promising direction for enhancing grounding capabilities in complex, multi-modal generation tasks.

Via

Access Paper or Ask Questions

FT-MDT: Extracting Decision Trees from Medical Texts via a Novel Low-rank Adaptation Method

Oct 06, 2025

Yuheng Li, Jiechao Gao, Wei Han, Wenwen Ouyang, Wei Zhu, Hui Yi Leong

Abstract:Knowledge of the medical decision process, which can be modeled as medical decision trees (MDTs), is critical to building clinical decision support systems. However, current MDT construction methods rely heavily on time-consuming and laborious manual annotation. To address this challenge, we propose PI-LoRA (Path-Integrated LoRA), a novel low-rank adaptation method for automatically extracting MDTs from clinical guidelines and textbooks. We integrate gradient path information to capture synergistic effects between different modules, enabling more effective and reliable rank allocation. This framework ensures that the most critical modules receive appropriate rank allocations while less important ones are pruned, resulting in a more efficient and accurate model for extracting medical decision trees from clinical texts. Extensive experiments on medical guideline datasets demonstrate that our PI-LoRA method significantly outperforms existing parameter-efficient fine-tuning approaches for the Text2MDT task, achieving better accuracy with substantially reduced model complexity. The proposed method achieves state-of-the-art results while maintaining a lightweight architecture, making it particularly suitable for clinical decision support systems where computational resources may be limited.

* Accepted by EMNLP-2025 Industrial Track

Via

Access Paper or Ask Questions

AMAS: Adaptively Determining Communication Topology for LLM-based Multi-Agent System

Oct 02, 2025

Hui Yi Leong, Yuheng Li, Yuqing Wu, Wenwen Ouyang, Wei Zhu, Jiechao Gao

Abstract:Although large language models (LLMs) have revolutionized natural language processing capabilities, their practical implementation as autonomous multi-agent systems (MAS) for industrial problem-solving encounters persistent barriers. Conventional MAS architectures are fundamentally restricted by inflexible, hand-crafted graph topologies that lack contextual responsiveness, resulting in diminished efficacy across varied academic and commercial workloads. To surmount these constraints, we introduce AMAS, a paradigm-shifting framework that redefines LLM-based MAS through a novel dynamic graph designer. This component autonomously identifies task-specific optimal graph configurations via lightweight LLM adaptation, eliminating the reliance on monolithic, universally applied structural templates. Instead, AMAS exploits the intrinsic properties of individual inputs to intelligently direct query trajectories through task-optimized agent pathways. Rigorous validation across question answering, mathematical deduction, and code generation benchmarks confirms that AMAS systematically exceeds state-of-the-art single-agent and multi-agent approaches across diverse LLM architectures. Our investigation establishes that context-sensitive structural adaptability constitutes a foundational requirement for high-performance LLM MAS deployments.

Via

Access Paper or Ask Questions

Track Any Anomalous Object: A Granular Video Anomaly Detection Pipeline

Jun 05, 2025

Yuzhi Huang, Chenxin Li, Haitao Zhang, Zixu Lin, Yunlong Lin, Hengyu Liu, Wuyang Li, Xinyu Liu, Jiechao Gao, Yue Huang(+2 more)

Figure 1 for Track Any Anomalous Object: A Granular Video Anomaly Detection Pipeline

Figure 2 for Track Any Anomalous Object: A Granular Video Anomaly Detection Pipeline

Figure 3 for Track Any Anomalous Object: A Granular Video Anomaly Detection Pipeline

Figure 4 for Track Any Anomalous Object: A Granular Video Anomaly Detection Pipeline

Abstract:Video anomaly detection (VAD) is crucial in scenarios such as surveillance and autonomous driving, where timely detection of unexpected activities is essential. Although existing methods have primarily focused on detecting anomalous objects in videos -- either by identifying anomalous frames or objects -- they often neglect finer-grained analysis, such as anomalous pixels, which limits their ability to capture a broader range of anomalies. To address this challenge, we propose a new framework called Track Any Anomalous Object (TAO), which introduces a granular video anomaly detection pipeline that, for the first time, integrates the detection of multiple fine-grained anomalous objects into a unified framework. Unlike methods that assign anomaly scores to every pixel, our approach transforms the problem into pixel-level tracking of anomalous objects. By linking anomaly scores to downstream tasks such as segmentation and tracking, our method removes the need for threshold tuning and achieves more precise anomaly localization in long and complex video sequences. Experiments demonstrate that TAO sets new benchmarks in accuracy and robustness. Project page available online.

Via

Access Paper or Ask Questions

Federated Neural Architecture Search with Model-Agnostic Meta Learning

Apr 08, 2025

Xinyuan Huang, Jiechao Gao

Figure 1 for Federated Neural Architecture Search with Model-Agnostic Meta Learning

Figure 2 for Federated Neural Architecture Search with Model-Agnostic Meta Learning

Figure 3 for Federated Neural Architecture Search with Model-Agnostic Meta Learning

Figure 4 for Federated Neural Architecture Search with Model-Agnostic Meta Learning

Abstract:Federated Learning (FL) often struggles with data heterogeneity due to the naturally uneven distribution of user data across devices. Federated Neural Architecture Search (NAS) enables collaborative search for optimal model architectures tailored to heterogeneous data to achieve higher accuracy. However, this process is time-consuming due to extensive search space and retraining. To overcome this, we introduce FedMetaNAS, a framework that integrates meta-learning with NAS within the FL context to expedite the architecture search by pruning the search space and eliminating the retraining stage. Our approach first utilizes the Gumbel-Softmax reparameterization to facilitate relaxation of the mixed operations in the search space. We then refine the local search process by incorporating Model-Agnostic Meta-Learning, where a task-specific learner adapts both weights and architecture parameters (alphas) for individual tasks, while a meta learner adjusts the overall model weights and alphas based on the gradient information from task learners. Following the meta-update, we propose soft pruning using the same trick on search space to gradually sparsify the architecture, ensuring that the performance of the chosen architecture remains robust after pruning which allows for immediate use of the model without retraining. Experimental evaluations demonstrate that FedMetaNAS significantly accelerates the search process by more than 50\% with higher accuracy compared to FedNAS.

Via

Access Paper or Ask Questions

Learning Novel Transformer Architecture for Time-series Forecasting

Feb 19, 2025

Juyuan Zhang, Wei Zhu, Jiechao Gao

Figure 1 for Learning Novel Transformer Architecture for Time-series Forecasting

Figure 2 for Learning Novel Transformer Architecture for Time-series Forecasting

Figure 3 for Learning Novel Transformer Architecture for Time-series Forecasting

Figure 4 for Learning Novel Transformer Architecture for Time-series Forecasting

Abstract:Despite the success of Transformer-based models in the time-series prediction (TSP) tasks, the existing Transformer architecture still face limitations and the literature lacks comprehensive explorations into alternative architectures. To address these challenges, we propose AutoFormer-TS, a novel framework that leverages a comprehensive search space for Transformer architectures tailored to TSP tasks. Our framework introduces a differentiable neural architecture search (DNAS) method, AB-DARTS, which improves upon existing DNAS approaches by enhancing the identification of optimal operations within the architecture. AutoFormer-TS systematically explores alternative attention mechanisms, activation functions, and encoding operations, moving beyond the traditional Transformer design. Extensive experiments demonstrate that AutoFormer-TS consistently outperforms state-of-the-art baselines across various TSP benchmarks, achieving superior forecasting accuracy while maintaining reasonable training efficiency.

Via

Access Paper or Ask Questions

Adapting Large Language Models for Time Series Modeling via a Novel Parameter-efficient Adaptation Method

Feb 19, 2025

Juyuan Zhang, Wei Zhu, Jiechao Gao

Figure 1 for Adapting Large Language Models for Time Series Modeling via a Novel Parameter-efficient Adaptation Method

Figure 2 for Adapting Large Language Models for Time Series Modeling via a Novel Parameter-efficient Adaptation Method

Figure 3 for Adapting Large Language Models for Time Series Modeling via a Novel Parameter-efficient Adaptation Method

Figure 4 for Adapting Large Language Models for Time Series Modeling via a Novel Parameter-efficient Adaptation Method

Abstract:Time series modeling holds significant importance in many real-world applications and has been extensively studied. While pre-trained foundation models have made impressive strides in the fields of natural language processing (NLP) and computer vision (CV), their development in time series domains has been constrained by data sparsity. A series of recent studies have demonstrated that large language models (LLMs) possess robust pattern recognition and reasoning abilities over complex sequences of tokens. However, the current literature have yet striked a high-quality balance between (a) effectively aligning the time series and natural language modalities, and (b) keeping the inference efficiency. To address the above issues, we now propose the Time-LlaMA framework. Time-LlaMA first converts the time series input into token embeddings through a linear tokenization mechanism. Second, the time series token embeddings are aligned with the text prompts. Third, to further adapt the LLM backbone for time series modeling, we have developed a dynamic low-rank adaptation technique (D-LoRA). D-LoRA dynamically chooses the most suitable LoRA modules at each layer of the Transformer backbone for each time series input, enhancing the model's predictive capabilities. Our experimental results on an extensive collection of challenging real-world time series tasks confirm that our proposed method achieves the state-of-the-art (SOTA) performance.

Via

Access Paper or Ask Questions

DRUM: Learning Demonstration Retriever for Large MUlti-modal Models

Dec 10, 2024

Ellen Yi-Ge, Jiechao Gao, Wei Han, Wei Zhu

Figure 1 for DRUM: Learning Demonstration Retriever for Large MUlti-modal Models

Figure 2 for DRUM: Learning Demonstration Retriever for Large MUlti-modal Models

Figure 3 for DRUM: Learning Demonstration Retriever for Large MUlti-modal Models

Figure 4 for DRUM: Learning Demonstration Retriever for Large MUlti-modal Models

Abstract:Recently, large language models (LLMs) have demonstrated impressive capabilities in dealing with new tasks with the help of in-context learning (ICL). In the study of Large Vision-Language Models (LVLMs), when implementing ICL, researchers usually adopts the naive strategies like fixed demonstrations across different samples, or selecting demonstrations directly via a visual-language embedding model. These methods does not guarantee the configured demonstrations fit the need of the LVLMs. To address this issue, we now propose a novel framework, \underline{d}emonstration \underline{r}etriever for large m\underline{u}lti-modal \underline{m}odel (DRUM), which fine-tunes the visual-language embedding model to better meet the LVLM's needs. First, we discuss the retrieval strategies for a visual-language task, assuming an embedding model is given. And we propose to concate the image and text embeddings to enhance the retrieval performance. Second, we propose to re-rank the demonstrations retrieved by the embedding model via the LVLM's feedbacks, and calculate a list-wise ranking loss for training the embedding model. Third, we propose an iterative demonstration mining strategy to improve the training of the embedding model. Through extensive experiments on 3 types of visual-language tasks, 7 benchmark datasets, our DRUM framework is proven to be effective in boosting the LVLM's in-context learning performance via retrieving more proper demonstrations.

Via

Access Paper or Ask Questions

H-FedSN: Personalized Sparse Networks for Efficient and Accurate Hierarchical Federated Learning for IoT Applications

Dec 09, 2024

Jiechao Gao, Yuangang Li, Yue Zhao, Brad Campbell

Figure 1 for H-FedSN: Personalized Sparse Networks for Efficient and Accurate Hierarchical Federated Learning for IoT Applications

Figure 2 for H-FedSN: Personalized Sparse Networks for Efficient and Accurate Hierarchical Federated Learning for IoT Applications

Figure 3 for H-FedSN: Personalized Sparse Networks for Efficient and Accurate Hierarchical Federated Learning for IoT Applications

Figure 4 for H-FedSN: Personalized Sparse Networks for Efficient and Accurate Hierarchical Federated Learning for IoT Applications

Abstract:The proliferation of Internet of Things (IoT) has increased interest in federated learning (FL) for privacy-preserving distributed data utilization. However, traditional two-tier FL architectures inadequately adapt to multi-tier IoT environments. While Hierarchical Federated Learning (HFL) improves practicality in multi-tier IoT environments by multi-layer aggregation, it still faces challenges in communication efficiency and accuracy due to high data transfer volumes, data heterogeneity, and imbalanced device distribution, struggling to meet the low-latency and high-accuracy model training requirements of practical IoT scenarios. To overcome these limitations, we propose H-FedSN, an innovative approach for practical IoT environments. H-FedSN introduces a binary mask mechanism with shared and personalized layers to reduce communication overhead by creating a sparse network while keeping original weights frozen. To address data heterogeneity and imbalanced device distribution, we integrate personalized layers for local data adaptation and apply Bayesian aggregation with cumulative Beta distribution updates at edge and cloud levels, effectively balancing contributions from diverse client groups. Evaluations on three real-world IoT datasets and MNIST under non-IID settings demonstrate that H-FedSN significantly reduces communication costs by 58 to 238 times compared to HierFAVG while achieving high accuracy, making it highly effective for practical IoT applications in hierarchical federated learning scenarios.

Via

Access Paper or Ask Questions

FedMetaMed: Federated Meta-Learning for Personalized Medication in Distributed Healthcare Systems

Dec 05, 2024

Jiechao Gao, Yuangang Li

Figure 1 for FedMetaMed: Federated Meta-Learning for Personalized Medication in Distributed Healthcare Systems

Figure 2 for FedMetaMed: Federated Meta-Learning for Personalized Medication in Distributed Healthcare Systems

Figure 3 for FedMetaMed: Federated Meta-Learning for Personalized Medication in Distributed Healthcare Systems

Figure 4 for FedMetaMed: Federated Meta-Learning for Personalized Medication in Distributed Healthcare Systems

Abstract:Personalized medication aims to tailor healthcare to individual patient characteristics. However, the heterogeneity of patient data across healthcare systems presents significant challenges to achieving accurate and effective personalized treatments. Ethical concerns further complicate the aggregation of large volumes of data from diverse institutions. Federated Learning (FL) offers a promising decentralized solution by enabling collaborative model training through the exchange of client models rather than raw data, thus preserving privacy. However, existing FL methods often suffer from retrogression during server aggregation, leading to a decline in model performance in real-world medical FL settings. To address data variability in distributed healthcare systems, we introduce Federated Meta-Learning for Personalized Medication (FedMetaMed), which combines federated learning and meta-learning to create models that adapt to diverse patient data across healthcare systems. The FedMetaMed framework aims to produce superior personalized models for individual clients by addressing these limitations. Specifically, we introduce Cumulative Fourier Aggregation (CFA) at the server to improve stability and effectiveness in global knowledge aggregation. CFA achieves this by gradually integrating client models from low to high frequencies. At the client level, we implement a Collaborative Transfer Optimization (CTO) strategy with a three-step process - Retrieve, Reciprocate, and Refine - to enhance the personalized local model through seamless global knowledge transfer. Experiments on real-world medical imaging datasets demonstrate that FedMetaMed outperforms state-of-the-art FL methods, showing superior generalization even on out-of-distribution cohorts.

Via

Access Paper or Ask Questions