Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shengyu Zhang

Yusuf Hamied Department of Chemistry, University of Cambridge, UK

FedCFA: Alleviating Simpson's Paradox in Model Aggregation with Counterfactual Federated Learning

Dec 25, 2024

Zhonghua Jiang, Jimin Xu, Shengyu Zhang, Tao Shen, Jiwei Li, Kun Kuang, Haibin Cai, Fei Wu

Figure 1 for FedCFA: Alleviating Simpson's Paradox in Model Aggregation with Counterfactual Federated Learning

Figure 2 for FedCFA: Alleviating Simpson's Paradox in Model Aggregation with Counterfactual Federated Learning

Figure 3 for FedCFA: Alleviating Simpson's Paradox in Model Aggregation with Counterfactual Federated Learning

Figure 4 for FedCFA: Alleviating Simpson's Paradox in Model Aggregation with Counterfactual Federated Learning

Abstract:Federated learning (FL) is a promising technology for data privacy and distributed optimization, but it suffers from data imbalance and heterogeneity among clients. Existing FL methods try to solve the problems by aligning client with server model or by correcting client model with control variables. These methods excel on IID and general Non-IID data but perform mediocrely in Simpson's Paradox scenarios. Simpson's Paradox refers to the phenomenon that the trend observed on the global dataset disappears or reverses on a subset, which may lead to the fact that global model obtained through aggregation in FL does not accurately reflect the distribution of global data. Thus, we propose FedCFA, a novel FL framework employing counterfactual learning to generate counterfactual samples by replacing local data critical factors with global average data, aligning local data distributions with the global and mitigating Simpson's Paradox effects. In addition, to improve the quality of counterfactual samples, we introduce factor decorrelation (FDC) loss to reduce the correlation among features and thus improve the independence of extracted factors. We conduct extensive experiments on six datasets and verify that our method outperforms other FL methods in terms of efficiency and global model accuracy under limited communication rounds.

Via

Access Paper or Ask Questions

Non-Terrestrial Networking for 6G: Evolution, Opportunities, and Future Directions

Dec 01, 2024

Feng Wang, Shengyu Zhang, Huiting Yang, Tony Q. S. Quek

Figure 1 for Non-Terrestrial Networking for 6G: Evolution, Opportunities, and Future Directions

Figure 2 for Non-Terrestrial Networking for 6G: Evolution, Opportunities, and Future Directions

Figure 3 for Non-Terrestrial Networking for 6G: Evolution, Opportunities, and Future Directions

Figure 4 for Non-Terrestrial Networking for 6G: Evolution, Opportunities, and Future Directions

Abstract:From 5G onwards, Non-Terrestrial Networks (NTNs) have emerged as a key component of future network architectures. Leveraging Low Earth Orbit (LEO) satellite constellations, NTNs are capable of building a space Internet and present a paradigm shift in delivering mobile services to even the most remote regions on Earth. However, the extensive coverage and rapid movement of LEO satellites pose unique challenges for NTN networking, including user equipment (UE) access and inter-satellite delivery, which directly impact the quality of service (QoS) and data transmission continuity. This paper offers an in-depth review of advanced NTN management technologies in the context of 6G evolution, focusing on radio resource management, mobility management, and dynamic network slicing. Building on this foundation and considering the latest trends in NTN development, we then present some innovative perspectives to emerging challenges in satellite beamforming, handover mechanisms, and inter-satellite transmissions. Lastly, we identify open research issues and propose future directions aimed at advancing satellite Internet deployment and enhancing NTN performance.

Via

Access Paper or Ask Questions

Preliminary Evaluation of the Test-Time Training Layers in Recommendation System (Student Abstract)

Nov 19, 2024

Tianyu Zhan, Zheqi Lv, Shengyu Zhang, Jiwei Li

Figure 1 for Preliminary Evaluation of the Test-Time Training Layers in Recommendation System (Student Abstract)

Figure 2 for Preliminary Evaluation of the Test-Time Training Layers in Recommendation System (Student Abstract)

Figure 3 for Preliminary Evaluation of the Test-Time Training Layers in Recommendation System (Student Abstract)

Figure 4 for Preliminary Evaluation of the Test-Time Training Layers in Recommendation System (Student Abstract)

Abstract:This paper explores the application and effectiveness of Test-Time Training (TTT) layers in improving the performance of recommendation systems. We developed a model, TTT4Rec, utilizing TTT-Linear as the feature extraction layer. Our tests across multiple datasets indicate that TTT4Rec, as a base model, performs comparably or even surpasses other baseline models in similar environments.

* To be published in AAAI-25 Student Abstract and Poster Program

Via

Access Paper or Ask Questions

ProGraph: Temporally-alignable Probability Guided Graph Topological Modeling for 3D Human Reconstruction

Nov 07, 2024

Hongsheng Wang, Zehui Feng, Tong Xiao, Genfan Yang, Shengyu Zhang, Fei Wu, Feng Lin

Figure 1 for ProGraph: Temporally-alignable Probability Guided Graph Topological Modeling for 3D Human Reconstruction

Figure 2 for ProGraph: Temporally-alignable Probability Guided Graph Topological Modeling for 3D Human Reconstruction

Figure 3 for ProGraph: Temporally-alignable Probability Guided Graph Topological Modeling for 3D Human Reconstruction

Figure 4 for ProGraph: Temporally-alignable Probability Guided Graph Topological Modeling for 3D Human Reconstruction

Abstract:Current 3D human motion reconstruction methods from monocular videos rely on features within the current reconstruction window, leading to distortion and deformations in the human structure under local occlusions or blurriness in video frames. To estimate realistic 3D human mesh sequences based on incomplete features, we propose Temporally-alignable Probability Guided Graph Topological Modeling for 3D Human Reconstruction (ProGraph). For missing parts recovery, we exploit the explicit topological-aware probability distribution across the entire motion sequence. To restore the complete human, Graph Topological Modeling (GTM) learns the underlying topological structure, focusing on the relationships inherent in the individual parts. Next, to generate blurred motion parts, Temporal-alignable Probability Distribution (TPDist) utilizes the GTM to predict features based on distribution. This interactive mechanism facilitates motion consistency, allowing the restoration of human parts. Furthermore, Hierarchical Human Loss (HHLoss) constrains the probability distribution errors of inter-frame features during topological structure variation. Our Method achieves superior results than other SOTA methods in addressing occlusions and blurriness on 3DPW.

Via

Access Paper or Ask Questions

Fine-Grained Guidance for Retrievers: Leveraging LLMs' Feedback in Retrieval-Augmented Generation

Nov 06, 2024

Yuhang Liu, Xueyu Hu, Shengyu Zhang, Jingyuan Chen, Fan Wu, Fei Wu

Figure 1 for Fine-Grained Guidance for Retrievers: Leveraging LLMs' Feedback in Retrieval-Augmented Generation

Figure 2 for Fine-Grained Guidance for Retrievers: Leveraging LLMs' Feedback in Retrieval-Augmented Generation

Figure 3 for Fine-Grained Guidance for Retrievers: Leveraging LLMs' Feedback in Retrieval-Augmented Generation

Figure 4 for Fine-Grained Guidance for Retrievers: Leveraging LLMs' Feedback in Retrieval-Augmented Generation

Abstract:Retrieval-Augmented Generation (RAG) has proven to be an effective method for mitigating hallucination issues inherent in large language models (LLMs). Previous approaches typically train retrievers based on semantic similarity, lacking optimization for RAG. More recent works have proposed aligning retrievers with the preference signals of LLMs. However, these preference signals are often difficult for dense retrievers, which typically have weaker language capabilities, to understand and learn effectively. Drawing inspiration from pedagogical theories like Guided Discovery Learning, we propose a novel framework, FiGRet (Fine-grained Guidance for Retrievers), which leverages the language capabilities of LLMs to construct examples from a more granular, information-centric perspective to guide the learning of retrievers. Specifically, our method utilizes LLMs to construct easy-to-understand examples from samples where the retriever performs poorly, focusing on three learning objectives highly relevant to the RAG scenario: relevance, comprehensiveness, and purity. These examples serve as scaffolding to ultimately align the retriever with the LLM's preferences. Furthermore, we employ a dual curriculum learning strategy and leverage the reciprocal feedback between LLM and retriever to further enhance the performance of the RAG system. A series of experiments demonstrate that our proposed framework enhances the performance of RAG systems equipped with different retrievers and is applicable to various LLMs.

* 13 pages, 4 figures

Via

Access Paper or Ask Questions

Multi-Uncertainty Aware Autonomous Cooperative Planning

Nov 01, 2024

Shiyao Zhang, He Li, Shengyu Zhang, Shuai Wang, Derrick Wing Kwan Ng, Chengzhong Xu

Figure 1 for Multi-Uncertainty Aware Autonomous Cooperative Planning

Figure 2 for Multi-Uncertainty Aware Autonomous Cooperative Planning

Figure 3 for Multi-Uncertainty Aware Autonomous Cooperative Planning

Figure 4 for Multi-Uncertainty Aware Autonomous Cooperative Planning

Abstract:Autonomous cooperative planning (ACP) is a promising technique to improve the efficiency and safety of multi-vehicle interactions for future intelligent transportation systems. However, realizing robust ACP is a challenge due to the aggregation of perception, motion, and communication uncertainties. This paper proposes a novel multi-uncertainty aware ACP (MUACP) framework that simultaneously accounts for multiple types of uncertainties via regularized cooperative model predictive control (RC-MPC). The regularizers and constraints for perception, motion, and communication are constructed according to the confidence levels, weather conditions, and outage probabilities, respectively. The effectiveness of the proposed method is evaluated in the Car Learning to Act (CARLA) simulation platform. Results demonstrate that the proposed MUACP efficiently performs cooperative formation in real time and outperforms other benchmark approaches in various scenarios under imperfect knowledge of the environment.

Via

Access Paper or Ask Questions

Unconstrained Model Merging for Enhanced LLM Reasoning

Oct 17, 2024

Yiming Zhang, Baoyi He, Shengyu Zhang, Yuhao Fu, Qi Zhou, Zhijie Sang, Zijin Hong, Kejing Yang, Wenjun Wang, Jianbo Yuan(+5 more)

Figure 1 for Unconstrained Model Merging for Enhanced LLM Reasoning

Figure 2 for Unconstrained Model Merging for Enhanced LLM Reasoning

Figure 3 for Unconstrained Model Merging for Enhanced LLM Reasoning

Figure 4 for Unconstrained Model Merging for Enhanced LLM Reasoning

Abstract:Recent advancements in building domain-specific large language models (LLMs) have shown remarkable success, especially in tasks requiring reasoning abilities like logical inference over complex relationships and multi-step problem solving. However, creating a powerful all-in-one LLM remains challenging due to the need for proprietary data and vast computational resources. As a resource-friendly alternative, we explore the potential of merging multiple expert models into a single LLM. Existing studies on model merging mainly focus on generalist LLMs instead of domain experts, or the LLMs under the same architecture and size. In this work, we propose an unconstrained model merging framework that accommodates both homogeneous and heterogeneous model architectures with a focus on reasoning tasks. A fine-grained layer-wise weight merging strategy is designed for homogeneous models merging, while heterogeneous model merging is built upon the probabilistic distribution knowledge derived from instruction-response fine-tuning data. Across 7 benchmarks and 9 reasoning-optimized LLMs, we reveal key findings that combinatorial reasoning emerges from merging which surpasses simple additive effects. We propose that unconstrained model merging could serve as a foundation for decentralized LLMs, marking a notable progression from the existing centralized LLM framework. This evolution could enhance wider participation and stimulate additional advancement in the field of artificial intelligence, effectively addressing the constraints posed by centralized models.

* Under review

Via

Access Paper or Ask Questions

Semantic Codebook Learning for Dynamic Recommendation Models

Jul 31, 2024

Zheqi Lv, Shaoxuan He, Tianyu Zhan, Shengyu Zhang, Wenqiao Zhang, Jingyuan Chen, Zhou Zhao, Fei Wu

Figure 1 for Semantic Codebook Learning for Dynamic Recommendation Models

Figure 2 for Semantic Codebook Learning for Dynamic Recommendation Models

Figure 3 for Semantic Codebook Learning for Dynamic Recommendation Models

Figure 4 for Semantic Codebook Learning for Dynamic Recommendation Models

Abstract:Dynamic sequential recommendation (DSR) can generate model parameters based on user behavior to improve the personalization of sequential recommendation under various user preferences. However, it faces the challenges of large parameter search space and sparse and noisy user-item interactions, which reduces the applicability of the generated model parameters. The Semantic Codebook Learning for Dynamic Recommendation Models (SOLID) framework presents a significant advancement in DSR by effectively tackling these challenges. By transforming item sequences into semantic sequences and employing a dual parameter model, SOLID compresses the parameter generation search space and leverages homogeneity within the recommendation system. The introduction of the semantic metacode and semantic codebook, which stores disentangled item representations, ensures robust and accurate parameter generation. Extensive experiments demonstrates that SOLID consistently outperforms existing DSR, delivering more accurate, stable, and robust recommendations.

Via

Access Paper or Ask Questions

DIET: Customized Slimming for Incompatible Networks in Sequential Recommendation

Jun 15, 2024

Kairui Fu, Shengyu Zhang, Zheqi Lv, Jingyuan Chen, Jiwei Li

Figure 1 for DIET: Customized Slimming for Incompatible Networks in Sequential Recommendation

Figure 2 for DIET: Customized Slimming for Incompatible Networks in Sequential Recommendation

Figure 3 for DIET: Customized Slimming for Incompatible Networks in Sequential Recommendation

Figure 4 for DIET: Customized Slimming for Incompatible Networks in Sequential Recommendation

Abstract:Due to the continuously improving capabilities of mobile edges, recommender systems start to deploy models on edges to alleviate network congestion caused by frequent mobile requests. Several studies have leveraged the proximity of edge-side to real-time data, fine-tuning them to create edge-specific models. Despite their significant progress, these methods require substantial on-edge computational resources and frequent network transfers to keep the model up to date. The former may disrupt other processes on the edge to acquire computational resources, while the latter consumes network bandwidth, leading to a decrease in user satisfaction. In response to these challenges, we propose a customizeD slImming framework for incompatiblE neTworks(DIET). DIET deploys the same generic backbone (potentially incompatible for a specific edge) to all devices. To minimize frequent bandwidth usage and storage consumption in personalization, DIET tailors specific subnets for each edge based on its past interactions, learning to generate slimming subnets(diets) within incompatible networks for efficient transfer. It also takes the inter-layer relationships into account, empirically reducing inference time while obtaining more suitable diets. We further explore the repeated modules within networks and propose a more storage-efficient framework, DIETING, which utilizes a single layer of parameters to represent the entire network, achieving comparably excellent performance. The experiments across four state-of-the-art datasets and two widely used models demonstrate the superior accuracy in recommendation and efficiency in transmission and storage of our framework.

* Accepted by KDD 2024

Via

Access Paper or Ask Questions

Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems

May 31, 2024

Shengyu Zhang, Ziqi Jiang, Jiangchao Yao, Fuli Feng, Kun Kuang, Zhou Zhao, Shuo Li, Hongxia Yang, Tat-Seng Chua, Fei Wu

Figure 1 for Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems

Figure 2 for Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems

Figure 3 for Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems

Figure 4 for Causal Distillation for Alleviating Performance Heterogeneity in Recommender Systems

Abstract:Recommendation performance usually exhibits a long-tail distribution over users -- a small portion of head users enjoy much more accurate recommendation services than the others. We reveal two sources of this performance heterogeneity problem: the uneven distribution of historical interactions (a natural source); and the biased training of recommender models (a model source). As addressing this problem cannot sacrifice the overall performance, a wise choice is to eliminate the model bias while maintaining the natural heterogeneity. The key to debiased training lies in eliminating the effect of confounders that influence both the user's historical behaviors and the next behavior. The emerging causal recommendation methods achieve this by modeling the causal effect between user behaviors, however potentially neglect unobserved confounders (\eg, friend suggestions) that are hard to measure in practice. To address unobserved confounders, we resort to the front-door adjustment (FDA) in causal theory and propose a causal multi-teacher distillation framework (CausalD). FDA requires proper mediators in order to estimate the causal effects of historical behaviors on the next behavior. To achieve this, we equip CausalD with multiple heterogeneous recommendation models to model the mediator distribution. Then, the causal effect estimated by FDA is the expectation of recommendation prediction over the mediator distribution and the prior distribution of historical behaviors, which is technically achieved by multi-teacher ensemble. To pursue efficient inference, CausalD further distills multiple teachers into one student model to directly infer the causal effect for making recommendations.

* TKDE 2023

Via

Access Paper or Ask Questions