Abstract:Intelligent Transportation Systems (ITSs) have emerged as a promising solution towards ameliorating urban traffic congestion, with Traffic Signal Control (TSC) identified as a critical component. Although Multi-Agent Reinforcement Learning (MARL) algorithms have shown potential in optimizing TSC through real-time decision-making, their scalability and effectiveness often suffer from large-scale and complex environments. Typically, these limitations primarily stem from a fundamental mismatch between the exponential growth of the state space driven by the environmental heterogeneities and the limited modeling capacity of current solutions. To address these issues, this paper introduces a novel MARL framework that integrates Dynamic Graph Neural Networks (DGNNs) and Topological Data Analysis (TDA), aiming to enhance the expressiveness of environmental representations and improve agent coordination. Furthermore, inspired by the Mixture of Experts (MoE) architecture in Large Language Models (LLMs), a topology-assisted spatial pattern disentangling (TSD)-enhanced MoE is proposed, which leverages topological signatures to decouple graph features for specialized processing, thus improving the model's ability to characterize dynamic and heterogeneous local observations. The TSD module is also integrated into the policy and value networks of the Multi-agent Proximal Policy Optimization (MAPPO) algorithm, further improving decision-making efficiency and robustness. Extensive experiments conducted on real-world traffic scenarios, together with comprehensive theoretical analysis, validate the superior performance of the proposed framework, highlighting the model's scalability and effectiveness in addressing the complexities of large-scale TSC tasks.
Abstract:Online Continual Learning (OCL) presents a complex learning environment in which new data arrives in a batch-to-batch online format, and the risk of catastrophic forgetting can significantly impair model efficacy. In this study, we address OCL by introducing an innovative memory framework that incorporates a short-term memory system to retain dynamic information and a long-term memory system to archive enduring knowledge. Specifically, the long-term memory system comprises a collection of sub-memory buffers, each linked to a cluster prototype and designed to retain data samples from distinct categories. We propose a novel $K$-means-based sample selection method to identify cluster prototypes for each encountered category. To safeguard essential and critical samples, we introduce a novel memory optimisation strategy that selectively retains samples in the appropriate sub-memory buffer by evaluating each cluster prototype against incoming samples through an optimal transportation mechanism. This approach specifically promotes each sub-memory buffer to retain data samples that exhibit significant discrepancies from the corresponding cluster prototype, thereby ensuring the preservation of semantically rich information. In addition, we propose a novel Divide-and-Conquer (DAC) approach that formulates the memory updating as an optimisation problem and divides it into several subproblems. As a result, the proposed DAC approach can solve these subproblems separately and thus can significantly reduce computations of the proposed memory updating process. We conduct a series of experiments across standard and imbalanced learning settings, and the empirical findings indicate that the proposed memory framework achieves state-of-the-art performance in both learning contexts.
Abstract:Diffusion MRI (dMRI) is essential for studying brain microstructure, but high-resolution imaging remains challenging due to the inherent trade-offs between acquisition time and signal-to-noise ratio (SNR). Conventional methods often optimize only the diffusion-weighted images (DWIs) without considering their relationship with the non-diffusion-weighted (b=0) reference images. However, calculating diffusion metrics, such as the apparent diffusion coefficient (ADC) and diffusion tensor with its derived metrics like fractional anisotropy (FA) and mean diffusivity (MD), relies on the ratio between each DWI and the b=0 image, which is crucial for clinical observation and diagnostics. In this study, we demonstrate that solely enhancing DWIs using a conventional pixel-wise mean squared error (MSE) loss is insufficient, as the error in ratio between generated DWIs and b=0 diverges. We propose a novel ratio loss, defined as the MSE loss between the predicted and ground-truth log of DWI/b=0 ratios. Our results show that incorporating the ratio loss significantly improves the convergence of this ratio error, achieving lower ratio MSE and slightly enhancing the peak signal-to-noise ratio (PSNR) of generated DWIs. This leads to improved dMRI super-resolution and better preservation of b=0 ratio-based features for the derivation of diffusion metrics.
Abstract:Magnetic Resonance Imaging (MRI) is critical for clinical diagnostics but is often limited by long acquisition times and low signal-to-noise ratios, especially in modalities like diffusion and functional MRI. The multi-contrast nature of MRI presents a valuable opportunity for cross-modal enhancement, where high-resolution (HR) modalities can serve as references to boost the quality of their low-resolution (LR) counterparts-motivating the development of Multi-Contrast Super-Resolution (MCSR) techniques. Prior work has shown that leveraging complementary contrasts can improve SR performance; however, effective feature extraction and fusion across modalities with varying resolutions remains a major challenge. Moreover, existing MCSR methods often assume fixed resolution settings and all require large, perfectly paired training datasets-conditions rarely met in real-world clinical environments. To address these challenges, we propose a novel Modular Multi-Contrast Super-Resolution (MCSR) framework that eliminates the need for paired training data and supports arbitrary upscaling. Our method decouples the MCSR task into two stages: (1) Unpaired Cross-Modal Synthesis (U-CMS), which translates a high-resolution reference modality into a synthesized version of the target contrast, and (2) Unsupervised Super-Resolution (U-SR), which reconstructs the final output using implicit neural representations (INRs) conditioned on spatial coordinates. This design enables scale-agnostic and anatomically faithful reconstruction by bridging un-paired cross-modal synthesis with unsupervised resolution enhancement. Experiments show that our method achieves superior performance at 4x and 8x upscaling, with improved fidelity and anatomical consistency over existing baselines. Our framework demonstrates strong potential for scalable, subject-specific, and data-efficient MCSR in real-world clinical settings.
Abstract:The segmentation of cranial nerves (CNs) tract provides a valuable quantitative tool for the analysis of the morphology and trajectory of individual CNs. Multimodal CNs tract segmentation networks, e.g., CNTSeg, which combine structural Magnetic Resonance Imaging (MRI) and diffusion MRI, have achieved promising segmentation performance. However, it is laborious or even infeasible to collect complete multimodal data in clinical practice due to limitations in equipment, user privacy, and working conditions. In this work, we propose a novel arbitrary-modal fusion network for volumetric CNs tract segmentation, called CNTSeg-v2, which trains one model to handle different combinations of available modalities. Instead of directly combining all the modalities, we select T1-weighted (T1w) images as the primary modality due to its simplicity in data acquisition and contribution most to the results, which supervises the information selection of other auxiliary modalities. Our model encompasses an Arbitrary-Modal Collaboration Module (ACM) designed to effectively extract informative features from other auxiliary modalities, guided by the supervision of T1w images. Meanwhile, we construct a Deep Distance-guided Multi-stage (DDM) decoder to correct small errors and discontinuities through signed distance maps to improve segmentation accuracy. We evaluate our CNTSeg-v2 on the Human Connectome Project (HCP) dataset and the clinical Multi-shell Diffusion MRI (MDM) dataset. Extensive experimental results show that our CNTSeg-v2 achieves state-of-the-art segmentation performance, outperforming all competing methods.
Abstract:Cardiac diffusion tensor imaging (DTI) offers unique insights into cardiomyocyte arrangements, bridging the gap between microscopic and macroscopic cardiac function. However, its clinical utility is limited by technical challenges, including a low signal-to-noise ratio, aliasing artefacts, and the need for accurate quantitative fidelity. To address these limitations, we introduce RSFR (Reconstruction, Segmentation, Fusion & Refinement), a novel framework for cardiac diffusion-weighted image reconstruction. RSFR employs a coarse-to-fine strategy, leveraging zero-shot semantic priors via the Segment Anything Model and a robust Vision Mamba-based reconstruction backbone. Our framework integrates semantic features effectively to mitigate artefacts and enhance fidelity, achieving state-of-the-art reconstruction quality and accurate DT parameter estimation under high undersampling rates. Extensive experiments and ablation studies demonstrate the superior performance of RSFR compared to existing methods, highlighting its robustness, scalability, and potential for clinical translation in quantitative cardiac DTI.
Abstract:Extractive reading comprehension question answering (QA) datasets are typically evaluated using Exact Match (EM) and F1-score, but these metrics often fail to fully capture model performance. With the success of large language models (LLMs), they have been employed in various tasks, including serving as judges (LLM-as-a-judge). In this paper, we reassess the performance of QA models using LLM-as-a-judge across four reading comprehension QA datasets. We examine different families of LLMs and various answer types to evaluate the effectiveness of LLM-as-a-judge in these tasks. Our results show that LLM-as-a-judge is highly correlated with human judgments and can replace traditional EM/F1 metrics. By using LLM-as-a-judge, the correlation with human judgments improves significantly, from 0.17 (EM) and 0.36 (F1-score) to 0.85. These findings confirm that EM and F1 metrics underestimate the true performance of the QA models. While LLM-as-a-judge is not perfect for more difficult answer types (e.g., job), it still outperforms EM/F1, and we observe no bias issues, such as self-preference, when the same model is used for both the QA and judgment tasks.
Abstract:Quantitative MR (qMR) can provide numerical values representing the physical and chemical properties of the tissues. To collect a series of frames under varying settings, retrospective motion correction is essential to align the corresponding anatomical points or features. Under the assumption that the misalignment makes the discrepancy between the corresponding features larger, fitting error is a commonly used evaluation metric for motion correction in qMR. This study evaluates the reliability of the fitting error metric in cardiac diffusion tensor imaging (cDTI) after deformable registration. We found that while fitting error correlates with the negative eigenvalues, the negative Jacobian Determinant increases with broken cardiomyocytes, indicated by helix angle gradient line profiles. Since fitting error measures the distance between moved points and their re-rendered counterparts, the fitting parameter itself may be adjusted due to poor registration. Therefore, fitting error in deformable registration itself is a necessary but not sufficient metric and should be combined with other metrics.
Abstract:Given the scarcity and cost of high-field MRI, the synthesis of high-field MRI from low-field MRI holds significant potential when there is limited data for training downstream tasks (e.g. segmentation). Low-field MRI often suffers from a reduced signal-to-noise ratio (SNR) and spatial resolution compared to high-field MRI. However, synthesizing high-field MRI data presents challenges. These involve aligning image features across domains while preserving anatomical accuracy and enhancing fine details. To address these challenges, we propose a Pretext Task Adversarial (PTA) learning framework for high-field MRI synthesis from low-field MRI data. The framework comprises three processes: (1) The slice-wise gap perception (SGP) network aligns the slice inconsistencies of low-field and high-field datasets based on contrastive learning. (2) The local structure correction (LSC) network extracts local structures by restoring the locally rotated and masked images. (3) The pretext task-guided adversarial training process introduces additional supervision and incorporates a discriminator to improve image realism. Extensive experiments on low-field to ultra high-field task demonstrate the effectiveness of our method, achieving state-of-the-art performance (16.892 in FID, 1.933 in IS, and 0.324 in MS-SSIM). This enables the generation of high-quality high-field-like MRI data from low-field MRI data to augment training datasets for downstream tasks. The code is available at: https://github.com/Zhenxuan-Zhang/PTA4Unpaired_HF_MRI_SYN.
Abstract:Automatic medical report generation supports clinical diagnosis, reduces the workload of radiologists, and holds the promise of improving diagnosis consistency. However, existing evaluation metrics primarily assess the accuracy of key medical information coverage in generated reports compared to human-written reports, while overlooking crucial details such as the location and certainty of reported abnormalities. These limitations hinder the comprehensive assessment of the reliability of generated reports and pose risks in their selection for clinical use. Therefore, we propose a Granular Explainable Multi-Agent Score (GEMA-Score) in this paper, which conducts both objective quantification and subjective evaluation through a large language model-based multi-agent workflow. Our GEMA-Score parses structured reports and employs NER-F1 calculations through interactive exchanges of information among agents to assess disease diagnosis, location, severity, and uncertainty. Additionally, an LLM-based scoring agent evaluates completeness, readability, and clinical terminology while providing explanatory feedback. Extensive experiments validate that GEMA-Score achieves the highest correlation with human expert evaluations on a public dataset, demonstrating its effectiveness in clinical scoring (Kendall coefficient = 0.70 for Rexval dataset and Kendall coefficient = 0.54 for RadEvalX dataset). The anonymous project demo is available at: https://github.com/Zhenxuan-Zhang/GEMA_score.