Abstract:Federated learning enables collaborative model training across distributed clients while preserving data privacy. However, in practical deployments, device heterogeneity, non-independent, and identically distributed (Non-IID) data often lead to highly unstable and biased gradient updates. When differential privacy is enforced, conventional fixed gradient clipping and Gaussian noise injection may further amplify gradient perturbations, resulting in training oscillation and performance degradation and degraded model performance. To address these challenges, we propose an adaptive differentially private federated learning framework that explicitly targets model efficiency under heterogeneous and privacy-constrained settings. On the client side, a lightweight local compressed module is introduced to regularize intermediate representations and constrain gradient variability, thereby mitigating noise amplification during local optimization. On the server side, an adaptive gradient clipping strategy dynamically adjusts clipping thresholds based on historical update statistics to avoid over-clipping and noise domination. Furthermore, a constraint-aware aggregation mechanism is designed to suppress unreliable or noise-dominated client updates and stabilize global optimization. Extensive experiments on CIFAR-10 and SVHN demonstrate improved convergence stability and classification accuracy.
Abstract:With rapid expansion of cellular networks and the proliferation of mobile devices, cellular traffic data exhibits complex temporal dynamics and spatial correlations, posing challenges to accurate traffic prediction. Previous methods often focus predominantly on temporal modeling or depend on predefined spatial topologies, which limits their ability to jointly model spatio-temporal dependencies and effectively capture periodic patterns in cellular traffic. To address these issues, we propose a cellular traffic prediction framework that integrates spatio-temporal modeling with time-frequency analysis. First, we construct a spatial modeling branch to capture inter-cell dependencies through an attention mechanism, minimizing the reliance on predefined topological structures. Second, we build a time-frequency modeling branch to enhance the representation of periodic patterns. Furthermore, we introduce an adaptive-scale LogCosh loss function, which adjusts the error penalty based on traffic magnitude, preventing large errors from dominating the training process and helping the model maintain relatively stable prediction accuracy across different traffic intensities. Experiments on three open-sourced datasets demonstrate that the proposed method achieves prediction performance superior to state-of-the-art approaches.
Abstract:Network traffic forecasting plays a crucial role in intelligent network operations, but existing techniques often perform poorly when faced with limited data. Additionally, multi-task learning methods struggle with task imbalance and negative transfer, especially when modeling various service types. To overcome these challenges, we propose Sim-MSTNet, a multi-task spatiotemporal network traffic forecasting model based on the sim2real approach. Our method leverages a simulator to generate synthetic data, effectively addressing the issue of poor generalization caused by data scarcity. By employing a domain randomization technique, we reduce the distributional gap between synthetic and real data through bi-level optimization of both sample weighting and model training. Moreover, Sim-MSTNet incorporates attention-based mechanisms to selectively share knowledge between tasks and applies dynamic loss weighting to balance task objectives. Extensive experiments on two open-source datasets show that Sim-MSTNet consistently outperforms state-of-the-art baselines, achieving enhanced accuracy and generalization.
Abstract:Continual Visual Instruction Tuning (CVIT) enables Multimodal Large Language Models (MLLMs) to incrementally learn new tasks over time. However, this process is challenged by catastrophic forgetting, where performance on previously learned tasks deteriorates as the model adapts to new ones. A common approach to mitigate forgetting is architecture expansion, which introduces task-specific modules to prevent interference. Yet, existing methods often expand entire layers for each task, leading to significant parameter overhead and poor scalability. To overcome these issues, we introduce LoRA in LoRA (LiLoRA), a highly efficient architecture expansion method tailored for CVIT in MLLMs. LiLoRA shares the LoRA matrix A across tasks to reduce redundancy, applies an additional low-rank decomposition to matrix B to minimize task-specific parameters, and incorporates a cosine-regularized stability loss to preserve consistency in shared representations over time. Extensive experiments on a diverse CVIT benchmark show that LiLoRA consistently achieves superior performance in sequential task learning while significantly improving parameter efficiency compared to existing approaches.
Abstract:Anomalies are common in network system monitoring. When manifested as network threats to be mitigated, service outages to be prevented, and security risks to be ameliorated, detecting such anomalous network behaviors becomes of great importance. However, the growing scale and complexity of the mobile communication networks, as well as the ever-increasing amount and dimensionality of the network surveillance data, make it extremely difficult to monitor a mobile network and discover abnormal network behaviors. Recent advances in machine learning allow for obtaining near-optimal solutions to complicated decision-making problems with many sources of uncertainty that cannot be accurately characterized by traditional mathematical models. However, most machine learning algorithms are centralized, which renders them inapplicable to a large-scale distributed wireless networks with tens of millions of mobile devices. In this article, we present fog intelligence, a distributed machine learning architecture that enables intelligent wireless network management. It preserves the advantage of both edge processing and centralized cloud computing. In addition, the proposed architecture is scalable, privacy-preserving, and well suited for intelligent management of a distributed wireless network.
Abstract:Network traffic prediction techniques have attracted much attention since they are valuable for network congestion control and user experience improvement. While existing prediction techniques can achieve favorable performance when there is sufficient training data, it remains a great challenge to make accurate predictions when only a small amount of training data is available. To tackle this problem, we propose a deep learning model, entitled MetaSTNet, based on a multimodal meta-learning framework. It is an end-to-end network architecture that trains the model in a simulator and transfers the meta-knowledge to a real-world environment, which can quickly adapt and obtain accurate predictions on a new task with only a small amount of real-world training data. In addition, we further employ cross conformal prediction to assess the calibrated prediction intervals. Extensive experiments have been conducted on real-world datasets to illustrate the efficiency and effectiveness of MetaSTNet.
Abstract:Network traffic prediction plays a crucial role in intelligent network operation. Traditional prediction methods often rely on centralized training, necessitating the transfer of vast amounts of traffic data to a central server. This approach can lead to latency and privacy concerns. To address these issues, federated learning integrated with differential privacy has emerged as a solution to improve data privacy and model robustness in distributed settings. Nonetheless, existing federated learning protocols are vulnerable to Byzantine attacks, which may significantly compromise model robustness. Developing a robust and privacy-preserving prediction model in the presence of Byzantine clients remains a significant challenge. To this end, we propose an asynchronous differential federated learning framework based on distributionally robust optimization. The proposed framework utilizes multiple clients to train the prediction model collaboratively with local differential privacy. In addition, regularization techniques have been employed to further improve the Byzantine robustness of the models. We have conducted extensive experiments on three real-world datasets, and the results elucidate that our proposed distributed algorithm can achieve superior performance over existing methods.
Abstract:Presentation Attack Detection and Face Forgery Detection are designed to protect face data from physical media-based Presentation Attacks and digital editing-based DeepFakes respectively. But separate training of these two models makes them vulnerable to unknown attacks and burdens deployment environments. The lack of a Unified Face Attack Detection model to handle both types of attacks is mainly due to two factors. First, there's a lack of adequate benchmarks for models to explore. Existing UAD datasets have limited attack types and samples, restricting the model's ability to address advanced threats. To address this, we propose UniAttackDataPlus (UniAttackData+), the most extensive and sophisticated collection of forgery techniques to date. It includes 2,875 identities and their 54 kinds of falsified samples, totaling 697,347 videos. Second, there's a lack of a reliable classification criterion. Current methods try to find an arbitrary criterion within the same semantic space, which fails when encountering diverse attacks. So, we present a novel Visual-Language Model-based Hierarchical Prompt Tuning Framework (HiPTune) that adaptively explores multiple classification criteria from different semantic spaces. We build a Visual Prompt Tree to explore various classification rules hierarchically. Then, by adaptively pruning the prompts, the model can select the most suitable prompts to guide the encoder to extract discriminative features at different levels in a coarse-to-fine way. Finally, to help the model understand the classification criteria in visual space, we propose a Dynamically Prompt Integration module to project the visual prompts to the text encoder for more accurate semantics. Experiments on 12 datasets have shown the potential to inspire further innovations in the UAD field.
Abstract:Cost-aware Dynamic Workflow Scheduling (CADWS) is a key challenge in cloud computing, focusing on devising an effective scheduling policy to efficiently schedule dynamically arriving workflow tasks, represented as Directed Acyclic Graphs (DAG), to suitable virtual machines (VMs). Deep reinforcement learning (DRL) has been widely employed for automated scheduling policy design. However, the performance of DRL is heavily influenced by the design of the problem-tailored policy network and is highly sensitive to hyperparameters and the design of reward feedback. Considering the above-mentioned issues, this study proposes a novel DRL method combining Graph Attention Networks-based policy network and Evolution Strategy, referred to as GATES. The contributions of GATES are summarized as follows: (1) GATES can capture the impact of current task scheduling on subsequent tasks by learning the topological relationships between tasks in a DAG. (2) GATES can learn the importance of each VM to ready tasks, increasing the chance of selecting the optimal VM. (3) Utilizing Evolution Strategy's robustness, exploratory nature, and tolerance for delayed rewards, GATES achieves stable policy learning in CADWS. Extensive experimental results demonstrate the superiority of the proposed GATES in CADWS, outperforming several state-of-the-art algorithms. Codes are available at: https://github.com/YaShen998/GATES
Abstract:Community detection, a vital technology for real-world applications, uncovers cohesive node groups (communities) by leveraging both topological and attribute similarities in social networks. However, existing Graph Convolutional Networks (GCNs) trained to maximize modularity often converge to suboptimal solutions. Additionally, directly using human-labeled communities for training can undermine topological cohesiveness by grouping disconnected nodes based solely on node attributes. We address these issues by proposing a novel Topological and Attributive Similarity-based Community detection (TAS-Com) method. TAS-Com introduces a novel loss function that exploits the highly effective and scalable Leiden algorithm to detect community structures with global optimal modularity. Leiden is further utilized to refine human-labeled communities to ensure connectivity within each community, enabling TAS-Com to detect community structures with desirable trade-offs between modularity and compliance with human labels. Experimental results on multiple benchmark networks confirm that TAS-Com can significantly outperform several state-of-the-art algorithms.