Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Di Wu

La Trobe University, Melbourne, Australia

FedGraM: Defending Against Untargeted Attacks in Federated Learning via Embedding Gram Matrix

May 20, 2025

Di Wu, Qian Li, Heng Yang, Yong Han

Figure 1 for FedGraM: Defending Against Untargeted Attacks in Federated Learning via Embedding Gram Matrix

Figure 2 for FedGraM: Defending Against Untargeted Attacks in Federated Learning via Embedding Gram Matrix

Figure 3 for FedGraM: Defending Against Untargeted Attacks in Federated Learning via Embedding Gram Matrix

Figure 4 for FedGraM: Defending Against Untargeted Attacks in Federated Learning via Embedding Gram Matrix

Abstract:Federated Learning (FL) enables geographically distributed clients to collaboratively train machine learning models by sharing only their local models, ensuring data privacy. However, FL is vulnerable to untargeted attacks that aim to degrade the global model's performance on the underlying data distribution. Existing defense mechanisms attempt to improve FL's resilience against such attacks, but their effectiveness is limited in practical FL environments due to data heterogeneity. On the contrary, we aim to detect and remove the attacks to mitigate their impact. Generalization contribution plays a crucial role in distinguishing untargeted attacks. Our observations indicate that, with limited data, the divergence between embeddings representing different classes provides a better measure of generalization than direct accuracy. In light of this, we propose a novel robust aggregation method, FedGraM, designed to defend against untargeted attacks in FL. The server maintains an auxiliary dataset containing one sample per class to support aggregation. This dataset is fed to the local models to extract embeddings. Then, the server calculates the norm of the Gram Matrix of the embeddings for each local model. The norm serves as an indicator of each model's inter-class separation capability in the embedding space. FedGraM identifies and removes potentially malicious models by filtering out those with the largest norms, then averages the remaining local models to form the global model. We conduct extensive experiments to evaluate the performance of FedGraM. Our empirical results show that with limited data samples used to construct the auxiliary dataset, FedGraM achieves exceptional performance, outperforming state-of-the-art defense methods.

Via

Access Paper or Ask Questions

Leveraging Multivariate Long-Term History Representation for Time Series Forecasting

May 20, 2025

Huiliang Zhang, Di Wu, Arnaud Zinflou, Stephane Dellacherie, Mouhamadou Makhtar Dione, Benoit Boulet

Abstract:Multivariate Time Series (MTS) forecasting has a wide range of applications in both industry and academia. Recent advances in Spatial-Temporal Graph Neural Network (STGNN) have achieved great progress in modelling spatial-temporal correlations. Limited by computational complexity, most STGNNs for MTS forecasting focus primarily on short-term and local spatial-temporal dependencies. Although some recent methods attempt to incorporate univariate history into modeling, they still overlook crucial long-term spatial-temporal similarities and correlations across MTS, which are essential for accurate forecasting. To fill this gap, we propose a framework called the Long-term Multivariate History Representation (LMHR) Enhanced STGNN for MTS forecasting. Specifically, a Long-term History Encoder (LHEncoder) is adopted to effectively encode the long-term history into segment-level contextual representations and reduce point-level noise. A non-parametric Hierarchical Representation Retriever (HRetriever) is designed to include the spatial information in the long-term spatial-temporal dependency modelling and pick out the most valuable representations with no additional training. A Transformer-based Aggregator (TAggregator) selectively fuses the sparsely retrieved contextual representations based on the ranking positional embedding efficiently. Experimental results demonstrate that LMHR outperforms typical STGNNs by 10.72% on the average prediction horizons and state-of-the-art methods by 4.12% on several real-world datasets. Additionally, it consistently improves prediction accuracy by 9.8% on the top 10% of rapidly changing patterns across the datasets.

Via

Access Paper or Ask Questions

Towards Effective Federated Graph Foundation Model via Mitigating Knowledge Entanglement

May 19, 2025

Yinlin Zhu, Xunkai Li, Jishuo Jia, Miao Hu, Di Wu, Meikang Qiu

Abstract:Recent advances in graph machine learning have shifted to data-centric paradigms, driven by two emerging fields: (1) Federated graph learning (FGL) enables multi-client collaboration but faces challenges from data and task heterogeneity, limiting its practicality; (2) Graph foundation models (GFM) offer strong domain generalization but are usually trained on single machines, missing out on cross-silo data and resources. These paradigms are complementary, and their integration brings notable benefits. Motivated by this, we propose FedGFM, a novel decentralized GFM training paradigm. However, a key challenge is knowledge entanglement, where multi-domain knowledge merges into indistinguishable representations, hindering downstream adaptation. To address this, we present FedGFM+, an enhanced framework with two core modules to reduce knowledge entanglement: (1) AncDAI: A global anchor-based domain-aware initialization strategy. Before pre-training, each client encodes its local graph into domain-specific prototypes that serve as semantic anchors. Synthetic embeddings around these anchors initialize the global model. We theoretically prove these prototypes are distinguishable across domains, providing a strong inductive bias to disentangle domain-specific knowledge. (2) AdaDPP: A local adaptive domain-sensitive prompt pool. Each client learns a lightweight graph prompt capturing domain semantics during pre-training. During fine-tuning, prompts from all clients form a pool from which the GFM selects relevant prompts to augment target graph attributes, improving downstream adaptation. FedGFM+ is evaluated on 8 diverse benchmarks across multiple domains and tasks, outperforming 20 baselines from supervised learning, FGL, and federated GFM variants.

* Under Review

Via

Access Paper or Ask Questions

What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs

May 15, 2025

Xinlan Yan, Di Wu, Yibin Lei, Christof Monz, Iacer Calixto

Abstract:In this paper, we introduce S-MedQA, an English medical question-answering (QA) dataset for benchmarking large language models in fine-grained clinical specialties. We use S-MedQA to check the applicability of a popular hypothesis related to knowledge injection in the knowledge-intense scenario of medical QA, and show that: 1) training on data from a speciality does not necessarily lead to best performance on that specialty and 2) regardless of the specialty fine-tuned on, token probabilities of clinically relevant terms for all specialties increase consistently. Thus, we believe improvement gains come mostly from domain shifting (e.g., general to medical) rather than knowledge injection and suggest rethinking the role of fine-tuning data in the medical domain. We release S-MedQA and all code needed to reproduce all our experiments to the research community.

Via

Access Paper or Ask Questions

Skeleton-Guided Diffusion Model for Accurate Foot X-ray Synthesis in Hallux Valgus Diagnosis

May 13, 2025

Midi Wan, Pengfei Li, Yizhuo Liang, Di Wu, Yushan Pan, Guangzhen Zhu, Hao Wang

Abstract:Medical image synthesis plays a crucial role in providing anatomically accurate images for diagnosis and treatment. Hallux valgus, which affects approximately 19% of the global population, requires frequent weight-bearing X-rays for assessment, placing additional strain on both patients and healthcare providers. Existing X-ray models often struggle to balance image fidelity, skeletal consistency, and physical constraints, particularly in diffusion-based methods that lack skeletal guidance. We propose the Skeletal-Constrained Conditional Diffusion Model (SCCDM) and introduce KCC, a foot evaluation method utilizing skeletal landmarks. SCCDM incorporates multi-scale feature extraction and attention mechanisms, improving the Structural Similarity Index (SSIM) by 5.72% (0.794) and Peak Signal-to-Noise Ratio (PSNR) by 18.34% (21.40 dB). When combined with KCC, the model achieves an average score of 0.85, demonstrating strong clinical applicability. The code is available at https://github.com/midisec/SCCDM.

Via

Access Paper or Ask Questions

LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders

May 07, 2025

Zheng Chai, Qin Ren, Xijun Xiao, Huizhi Yang, Bo Han, Sijun Zhang, Di Chen, Hui Lu, Wenlin Zhao, Lele Yu(+7 more)

Figure 1 for LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders

Figure 2 for LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders

Figure 3 for LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders

Figure 4 for LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders

Abstract:Modeling ultra-long user behavior sequences is critical for capturing both long- and short-term preferences in industrial recommender systems. Existing solutions typically rely on two-stage retrieval or indirect modeling paradigms, incuring upstream-downstream inconsistency and computational inefficiency. In this paper, we present LONGER, a Long-sequence Optimized traNsformer for GPU-Efficient Recommenders. LONGER incorporates (i) a global token mechanism for stabilizing attention over long contexts, (ii) a token merge module with lightweight InnerTransformers and hybrid attention strategy to reduce quadratic complexity, and (iii) a series of engineering optimizations, including training with mixed-precision and activation recomputation, KV cache serving, and the fully synchronous model training and serving framework for unified GPU-based dense and sparse parameter updates. LONGER consistently outperforms strong baselines in both offline metrics and online A/B testing in both advertising and e-commerce services at ByteDance, validating its consistent effectiveness and industrial-level scaling laws. Currently, LONGER has been fully deployed at more than 10 influential scenarios at ByteDance, serving billion users.

Via

Access Paper or Ask Questions

Concept Factorization via Self-Representation and Adaptive Graph Structure Learning

May 06, 2025

Zhengqin Yang, Di Wu, Jia Chen, Xin Luo

Abstract:Concept Factorization (CF) models have attracted widespread attention due to their excellent performance in data clustering. In recent years, many variant models based on CF have achieved great success in clustering by taking into account the internal geometric manifold structure of the dataset and using graph regularization techniques. However, their clustering performance depends greatly on the construction of the initial graph structure. In order to enable adaptive learning of the graph structure of the data, we propose a Concept Factorization Based on Self-Representation and Adaptive Graph Structure Learning (CFSRAG) Model. CFSRAG learns the affinity relationship between data through a self-representation method, and uses the learned affinity matrix to implement dynamic graph regularization constraints, thereby ensuring dynamic learning of the internal geometric structure of the data. Finally, we give the CFSRAG update rule and convergence analysis, and conduct comparative experiments on four real datasets. The results show that our model outperforms other state-of-the-art models.

Via

Access Paper or Ask Questions

Calibrating Translation Decoding with Quality Estimation on LLMs

Apr 26, 2025

Di Wu, Yibin Lei, Christof Monz

Figure 1 for Calibrating Translation Decoding with Quality Estimation on LLMs

Figure 2 for Calibrating Translation Decoding with Quality Estimation on LLMs

Figure 3 for Calibrating Translation Decoding with Quality Estimation on LLMs

Figure 4 for Calibrating Translation Decoding with Quality Estimation on LLMs

Abstract:Neural machine translation (NMT) systems typically employ maximum a posteriori (MAP) decoding to select the highest-scoring translation from the distribution mass. However, recent evidence highlights the inadequacy of MAP decoding, often resulting in low-quality or even pathological hypotheses -- the decoding objective is not aligned with real-world translation quality. This paper proposes calibrating hypothesis likelihoods with translation quality from a distribution view by directly optimizing their Pearson correlation -- thereby enhancing the effectiveness of translation decoding. With our method, translation on large language models (LLMs) improves substantially after limited training (2K instances per direction). This improvement is orthogonal to those achieved through supervised fine-tuning, leading to substantial gains across a broad range of metrics and human evaluations -- even when applied to top-performing translation-specialized LLMs fine-tuned on high-quality translation data, such as Tower, or when compared to recent preference optimization methods, like CPO. Moreover, the calibrated translation likelihood can directly serve as a strong proxy for translation quality, closely approximating or even surpassing some state-of-the-art translation quality estimation models, like CometKiwi. Lastly, our in-depth analysis demonstrates that calibration enhances the effectiveness of MAP decoding, thereby enabling greater efficiency in real-world deployment. The resulting state-of-the-art translation model, which covers 10 languages, along with the accompanying code and human evaluation data, has been released to the community: https://github.com/moore3930/calibrating-llm-mt.

Via

Access Paper or Ask Questions

Back to Fundamentals: Low-Level Visual Features Guided Progressive Token Pruning

Apr 25, 2025

Yuanbing Ouyang, Yizhuo Liang, Qingpeng Li, Xinfei Guo, Yiming Luo, Di Wu, Hao Wang, Yushan Pan

Abstract:Vision Transformers (ViTs) excel in semantic segmentation but demand significant computation, posing challenges for deployment on resource-constrained devices. Existing token pruning methods often overlook fundamental visual data characteristics. This study introduces 'LVTP', a progressive token pruning framework guided by multi-scale Tsallis entropy and low-level visual features with twice clustering. It integrates high-level semantics and basic visual attributes for precise segmentation. A novel dynamic scoring mechanism using multi-scale Tsallis entropy weighting overcomes limitations of traditional single-parameter entropy. The framework also incorporates low-level feature analysis to preserve critical edge information while optimizing computational cost. As a plug-and-play module, it requires no architectural changes or additional training. Evaluations across multiple datasets show 20%-45% computational reductions with negligible performance loss, outperforming existing methods in balancing cost and accuracy, especially in complex edge regions.

Via

Access Paper or Ask Questions

CKMDiff: A Generative Diffusion Model for CKM Construction via Inverse Problems with Learned Priors

Apr 24, 2025

Shen Fu, Yong Zeng, Zijian Wu, Di Wu, Shi Jin, Cheng-Xiang Wang, Xiqi Gao

Abstract:Channel knowledge map (CKM) is a promising technology to enable environment-aware wireless communications and sensing with greatly enhanced performance, by offering location-specific channel prior information for future wireless networks. One fundamental problem for CKM-enabled wireless systems lies in how to construct high-quality and complete CKM for all locations of interest, based on only limited and noisy on-site channel knowledge data. This problem resembles the long-standing ill-posed inverse problem, which tries to infer from a set of limited and noisy observations the cause factors that produced them. By utilizing the recent advances of solving inverse problems with learned priors using generative artificial intelligence (AI), we propose CKMDiff, a conditional diffusion model that can be applied to perform various tasks for CKM constructions such as denoising, inpainting, and super-resolution, without having to know the physical environment maps or transceiver locations. Furthermore, we propose an environment-aware data augmentation mechanism to enhance the model's ability to learn implicit relations between electromagnetic propagation patterns and spatial-geometric features. Extensive numerical results are provided based on the CKMImageNet and RadioMapSeer datasets, which demonstrate that the proposed CKMDiff achieves state-of-the-art performance, outperforming various benchmark methods.

Via

Access Paper or Ask Questions