Sherman
Abstract:Multi-agent reinforcement learning (MARL) has achieved strong performance in cooperative adversarial tasks. However, most existing methods typically train agents against fixed opponent strategies and rely on such meta-static difficulty conditions, which limits their adaptability to changing environments and often leads to suboptimal policies. Inspired by the success of curriculum learning (CL) in supervised tasks, we propose a dynamic CL framework for MARL that employs an self-adaptive difficulty adjustment mechanism. This mechanism continuously modulates opponent strength based on real-time agent training performance, allowing agents to progressively learn from easier to more challenging scenarios. However, the dynamic nature of CL introduces instability due to nonstationary environments and sparse global rewards. To address this challenge, we develop a Counterfactual Group Relative Policy Advantage (CGRPA), which is tightly coupled with the curriculum by providing intrinsic credit signals that reflect each agent's impact under evolving task demands. CGRPA constructs a counterfactual advantage function that isolates individual contributions within group behavior, facilitating more reliable policy updates throughout the curriculum. CGRPA evaluates each agent's contribution through constructing counterfactual action advantage function, providing intrinsic rewards that enhance credit assignment and stabilize learning under non-stationary conditions. Extensive experiments demonstrate that our method improves both training stability and final performance, achieving competitive results against state-of-the-art methods. The code is available at https://github.com/NICE-HKU/CL2MARL-SMAC.
Abstract:Integrated sensing and communication (ISAC) has been envisioned to play a more important role in future wireless networks. However, the design of ISAC networks is challenging, especially when there are multiple communication and sensing (C\&S) nodes and multiple sensing targets. We investigate a multi-base station (BS) ISAC network in which multiple BSs equipped with multiple antennas simultaneously provide C\&S services for multiple ground communication users (CUs) and targets. To enhance the overall performance of C\&S, we formulate a joint user association (UA) and multi-BS transmit beamforming optimization problem with the objective of maximizing the total sum rate of all CUs while ensuring both the minimum target detection and parameter estimation requirements. To efficiently solve the highly non-convex mixed integer nonlinear programming (MINLP) optimization problem, we propose an alternating optimization (AO)-based algorithm that decomposes the problem into two sub-problems, i.e., UA optimization and multi-BS transmit beamforming optimization. Inspired by large language models (LLMs) for prediction and inference, we propose a unified framework integrating LLMs with convex-based optimization methods. First, we propose a comprehensive design of prompt engineering, including few-shot, chain of thought, and self-reflection techniques to guide LLMs in solving the binary integer programming UA optimization problem. Second, we utilize convex-based optimization methods to handle the non-convex beamforming optimization problem based on fractional programming (FP), majorization minimization (MM), and the alternating direction method of multipliers (ADMM) with an optimized UA from LLMs. Numerical results demonstrate that our proposed LLM-enabled AO-based algorithm achieves fast convergence and near upper-bound performance with the GPT-o1 model, outperforming various benchmark schemes.
Abstract:Low-altitude economy (LAE) represents an emerging economic paradigm that redefines commercial and social aerial activities. Large artificial intelligence models (LAIMs) offer transformative potential to further enhance the intelligence of LAE services. However, deploying LAIMs in LAE poses several challenges, including the significant gap between their computational/storage demands and the limited onboard resources of LAE entities, the mismatch between lab-trained LAIMs and dynamic physical environments, and the inefficiencies of traditional decoupled designs for sensing, communication, and computation. To address these issues, we first propose a hierarchical system architecture tailored for LAIM deployment and present representative LAE application scenarios. Next, we explore key enabling techniques that facilitate the mutual co-evolution of LAIMs and low-altitude systems, and introduce a task-oriented execution pipeline for scalable and adaptive service delivery. Then, the proposed framework is validated through real-world case studies. Finally, we outline open challenges to inspire future research.
Abstract:This paper introduces a two-stage generative AI (GenAI) framework tailored for temporal spectrum cartography in low-altitude economy networks (LAENets). LAENets, characterized by diverse aerial devices such as UAVs, rely heavily on wireless communication technologies while facing challenges, including spectrum congestion and dynamic environmental interference. Traditional spectrum cartography methods have limitations in handling the temporal and spatial complexities inherent to these networks. Addressing these challenges, the proposed framework first employs a Reconstructive Masked Autoencoder (RecMAE) capable of accurately reconstructing spectrum maps from sparse and temporally varying sensor data using a novel dual-mask mechanism. This approach significantly enhances the precision of reconstructed radio frequency (RF) power maps. In the second stage, the Multi-agent Diffusion Policy (MADP) method integrates diffusion-based reinforcement learning to optimize the trajectories of dynamic UAV sensors. By leveraging temporal-attention encoding, this method effectively manages spatial exploration and exploitation to minimize cumulative reconstruction errors. Extensive numerical experiments validate that this integrated GenAI framework outperforms traditional interpolation methods and deep learning baselines by achieving 57.35% and 88.68% reconstruction error reduction, respectively. The proposed trajectory planner substantially improves spectrum map accuracy, reconstruction stability, and sensor deployment efficiency in dynamically evolving low-altitude environments.
Abstract:Mixture of Experts (MoE) has emerged as a promising paradigm for scaling model capacity while preserving computational efficiency, particularly in large-scale machine learning architectures such as large language models (LLMs). Recent advances in MoE have facilitated its adoption in wireless networks to address the increasing complexity and heterogeneity of modern communication systems. This paper presents a comprehensive survey of the MoE framework in wireless networks, highlighting its potential in optimizing resource efficiency, improving scalability, and enhancing adaptability across diverse network tasks. We first introduce the fundamental concepts of MoE, including various gating mechanisms and the integration with generative AI (GenAI) and reinforcement learning (RL). Subsequently, we discuss the extensive applications of MoE across critical wireless communication scenarios, such as vehicular networks, unmanned aerial vehicles (UAVs), satellite communications, heterogeneous networks, integrated sensing and communication (ISAC), and mobile edge networks. Furthermore, key applications in channel prediction, physical layer signal processing, radio resource management, network optimization, and security are thoroughly examined. Additionally, we present a detailed overview of open-source datasets that are widely used in MoE-based models to support diverse machine learning tasks. Finally, this survey identifies crucial future research directions for MoE, emphasizing the importance of advanced training techniques, resource-aware gating strategies, and deeper integration with emerging 6G technologies.
Abstract:Recent advancements in large language models (LLMs) have led to their widespread adoption and large-scale deployment across various domains. However, their environmental impact, particularly during inference, has become a growing concern due to their substantial energy consumption and carbon footprint. Existing research has focused on inference computation alone, overlooking the analysis and optimization of carbon footprint in network-aided LLM service systems. To address this gap, we propose AOLO, a framework for analysis and optimization for low-carbon oriented wireless LLM services. AOLO introduces a comprehensive carbon footprint model that quantifies greenhouse gas emissions across the entire LLM service chain, including computational inference and wireless communication. Furthermore, we formulate an optimization problem aimed at minimizing the overall carbon footprint, which is solved through joint optimization of inference outputs and transmit power under quality-of-experience and system performance constraints. To achieve this joint optimization, we leverage the energy efficiency of spiking neural networks (SNNs) by adopting SNN as the actor network and propose a low-carbon-oriented optimization algorithm, i.e., SNN-based deep reinforcement learning (SDRL). Comprehensive simulations demonstrate that SDRL algorithm significantly reduces overall carbon footprint, achieving an 18.77% reduction compared to the benchmark soft actor-critic, highlighting its potential for enabling more sustainable LLM inference services.
Abstract:Low-Altitude Economy Networks (LAENets) have emerged as significant enablers of social activities, offering low-altitude services such as the transportation of packages, groceries, and medical supplies. Unlike traditional terrestrial networks, LAENets are characterized by control mechanisms and ever-changing operational factors, which make them more complex and susceptible to vulnerabilities. As applications of LAENet continue to expand, robustness of these systems becomes crucial. In this paper, we investigate a novel application of Generative Artificial Intelligence (GenAI) to improve the robustness of LAENets. We conduct a systematic analysis of robustness requirements for LAENets, complemented by a comprehensive review of robust Quality of Service (QoS) metrics from the wireless physical layer perspective. We then investigate existing GenAI-enabled approaches for robustness enhancement. This leads to our proposal of a novel diffusion-based optimization framework with a Mixture of Expert (MoE)-transformer actor network. In the robust beamforming case study, the proposed framework demonstrates its effectiveness by optimizing beamforming under uncertainties, achieving a more than 44% increase in the worst-case achievable secrecy rate. These findings highlight the significant potential of GenAI in strengthening LAENet robustness.
Abstract:Integrated sensing and communication (ISAC) uses the same software and hardware resources to achieve both communication and sensing functionalities. Thus, it stands as one of the core technologies of 6G and has garnered significant attention in recent years. In ISAC systems, a variety of machine learning models are trained to analyze and identify signal patterns, thereby ensuring reliable sensing and communications. However, considering factors such as communication rates, costs, and privacy, collecting sufficient training data from various ISAC scenarios for these models is impractical. Hence, this paper introduces a generative AI (GenAI) enabled robust data augmentation scheme. The scheme first employs a conditioned diffusion model trained on a limited amount of collected CSI data to generate new samples, thereby expanding the sample quantity. Building on this, the scheme further utilizes another diffusion model to enhance the sample quality, thereby facilitating the data augmentation in scenarios where the original sensing data is insufficient and unevenly distributed. Moreover, we propose a novel algorithm to estimate the acceleration and jerk of signal propagation path length changes from CSI. We then use the proposed scheme to enhance the estimated parameters and detect the number of targets based on the enhanced data. The evaluation reveals that our scheme improves the detection performance by up to 70%, demonstrating reliability and robustness, which supports the deployment and practical use of the ISAC network.
Abstract:Due to massive computational demands of large generative models, AI-Generated Content (AIGC) can organize collaborative Mobile AIGC Service Providers (MASPs) at network edges to provide ubiquitous and customized content generation for resource-constrained users. However, such a paradigm faces two significant challenges: 1) raw prompts (i.e., the task description from users) often lead to poor generation quality due to users' lack of experience with specific AIGC models, and 2) static service provisioning fails to efficiently utilize computational and communication resources given the heterogeneity of AIGC tasks. To address these challenges, we propose an intelligent mobile AIGC service scheme. Firstly, we develop an interactive prompt engineering mechanism that leverages a Large Language Model (LLM) to generate customized prompt corpora and employs Inverse Reinforcement Learning (IRL) for policy imitation through small-scale expert demonstrations. Secondly, we formulate a dynamic mobile AIGC service provisioning problem that jointly optimizes the number of inference trials and transmission power allocation. Then, we propose the Diffusion-Enhanced Deep Deterministic Policy Gradient (D3PG) algorithm to solve the problem. By incorporating the diffusion process into Deep Reinforcement Learning (DRL) architecture, the environment exploration capability can be improved, thus adapting to varying mobile AIGC scenarios. Extensive experimental results demonstrate that our prompt engineering approach improves single-round generation success probability by 6.3 times, while D3PG increases the user service experience by 67.8% compared to baseline DRL approaches.
Abstract:This paper investigates the deep learning based approaches for simultaneous wireless information and power transfer (SWIPT). The quality-of-service (QoS) constrained sum-rate maximization problems are, respectively, formulated for power-splitting (PS) receivers and time-switching (TS) receivers and solved by a unified graph neural network (GNN) based model termed SWIPT net (SWIPTNet). To improve the performance of SWIPTNet, we first propose a single-type output method to reduce the learning complexity and facilitate the satisfaction of QoS constraints, and then, utilize the Laplace transform to enhance input features with the structural information. Besides, we adopt the multi-head attention and layer connection to enhance feature extracting. Furthermore, we present the implementation of transfer learning to the SWIPTNet between PS and TS receivers. Ablation studies show the effectiveness of key components in the SWIPTNet. Numerical results also demonstrate the capability of SWIPTNet in achieving near-optimal performance with millisecond-level inference speed which is much faster than the traditional optimization algorithms. We also show the effectiveness of transfer learning via fast convergence and expressive capability improvement.