This paper investigates the unmanned aerial vehicle (UAV)-assisted resilience perspective in the 6G network energy saving (NES) scenario. More specifically, we consider multiple ground base stations (GBSs) and each GBS has three different sectors/cells in the terrestrial networks, and multiple cells are turned off due to NES or incidents, e.g., disasters, hardware failures, or outages. To address this, we propose a Multi-Agent Deep Deterministic Policy Gradient (MADDPG) framework to enable UAV-assisted communication by jointly optimizing UAV trajectories, transmission power, and user-UAV association under a sleeping ground base station (GBS) strategy. This framework aims to ensure the resilience of active users in the network and the long-term operability of UAVs. Specifically, it maximizes service coverage for users during power outages or NES zones, while minimizing the energy consumption of UAVs. Simulation results demonstrate that the proposed MADDPG policy consistently achieves high coverage ratio across different testing episodes, outperforming other baselines. Moreover, the MADDPG framework attains the lowest total energy consumption, with a reduction of approximately 24\% compared to the conventional all GBS ON configuration, while maintaining a comparable user service rate. These results confirm the effectiveness of the proposed approach in achieving a superior trade-off between energy efficiency and service performance, supporting the development of sustainable and resilient UAV-assisted cellular networks.
Space-air-ground integrated multi-access edge computing (SAGIN-MEC) provides a promising solution for the rapidly developing low-altitude economy (LAE) to deliver flexible and wide-area computing services. However, fully realizing the potential of SAGIN-MEC in the LAE presents significant challenges, including coordinating decisions across heterogeneous nodes with different roles, modeling complex factors such as mobility and network variability, and handling real-time decision-making under partially observable environment with hybrid variables. To address these challenges, we first present a hierarchical SAGIN-MEC architecture that enables the coordination between user devices (UDs), uncrewed aerial vehicles (UAVs), and satellites. Then, we formulate a UD cost minimization optimization problem (UCMOP) to minimize the UD cost by jointly optimizing the task offloading ratio, UAV trajectory planning, computing resource allocation, and UD association. We show that the UCMOP is an NP-hard problem. To overcome this challenge, we propose a multi-agent deep deterministic policy gradient (MADDPG)-convex optimization and coalitional game (MADDPG-COCG) algorithm. Specifically, we employ the MADDPG algorithm to optimize the continuous temporal decisions for heterogeneous nodes in the partially observable SAGIN-MEC system. Moreover, we propose a convex optimization and coalitional game (COCG) method to enhance the conventional MADDPG by deterministically handling the hybrid and varying-dimensional decisions. Simulation results demonstrate that the proposed MADDPG-COCG algorithm significantly enhances the user-centric performances in terms of the aggregated UD cost, task completion delay, and UD energy consumption, with a slight increase in UAV energy consumption, compared to the benchmark algorithms. Moreover, the MADDPG-COCG algorithm shows superior convergence stability and scalability.




Bus bunching remains a challenge for urban transit due to stochastic traffic and passenger demand. Traditional solutions rely on multi-agent reinforcement learning (MARL) in loop-line settings, which overlook realistic operations characterized by heterogeneous routes, timetables, fluctuating demand, and varying fleet sizes. We propose a novel single-agent reinforcement learning (RL) framework for bus holding control that avoids the data imbalance and convergence issues of MARL under near-realistic simulation. A bidirectional timetabled network with dynamic passenger demand is constructed. The key innovation is reformulating the multi-agent problem into a single-agent one by augmenting the state space with categorical identifiers (vehicle ID, station ID, time period) in addition to numerical features (headway, occupancy, velocity). This high-dimensional encoding enables single-agent policies to capture inter-agent dependencies, analogous to projecting non-separable inputs into a higher-dimensional space. We further design a structured reward function aligned with operational goals: instead of exponential penalties on headway deviations, a ridge-shaped reward balances uniform headways and schedule adherence. Experiments show that our modified soft actor-critic (SAC) achieves more stable and superior performance than benchmarks, including MADDPG (e.g., -430k vs. -530k under stochastic conditions). These results demonstrate that single-agent deep RL, when enhanced with categorical structuring and schedule-aware rewards, can effectively manage bus holding in non-loop, real-world contexts. This paradigm offers a robust, scalable alternative to MARL frameworks, particularly where agent-specific experiences are imbalanced.
Unmanned aerial vehicles (UAVs) have been recently utilized in multi-access edge computing (MEC) as edge servers. It is desirable to design UAVs' trajectories and user to UAV assignments to ensure satisfactory service to the users and energy efficient operation simultaneously. The posed optimization problem is challenging to solve because: (i) The formulated problem is non-convex, (ii) Due to the mobility of ground users, their future positions and channel gains are not known in advance, (iii) Local UAVs' observations should be communicated to a central entity that solves the optimization problem. The (semi-) centralized processing leads to communication overhead, communication/processing bottlenecks, lack of flexibility and scalability, and loss of robustness to system failures. To simultaneously address all these limitations, we advocate a fully decentralized setup with no centralized entity. Each UAV obtains its local observation and then communicates with its immediate neighbors only. After sharing information with neighbors, each UAV determines its next position via a locally run deep reinforcement learning (DRL) algorithm. None of the UAVs need to know the global communication graph. Two main components of our proposed solution are (i) Graph attention layers (GAT), and (ii) Experience and parameter sharing proximal policy optimization (EPS-PPO). Our proposed approach eliminates all the limitations of semi-centralized MADRL methods such as MAPPO and MA deep deterministic policy gradient (MADDPG), while guaranteeing a better performance than independent local DRLs such as in IPPO. Numerical results reveal notable performance gains in several different criteria compared to the existing MADDPG algorithm, demonstrating the potential for offering a better performance, while utilizing local communications only.
Multi-agent reinforcement learning (MARL) faces significant challenges in task sequencing and curriculum design, particularly for cooperative coordination scenarios. While curriculum learning has demonstrated success in single-agent domains, principled approaches for multi-agent coordination remain limited due to the absence of validated task complexity metrics. This approach presents a graph-based coordination complexity metric that integrates agent dependency entropy, spatial interference patterns, and goal overlap analysis to predict task difficulty in multi-agent environments. The complexity metric achieves strong empirical validation with rho = 0.952 correlation (p < 0.001) between predicted complexity and empirical difficulty determined by random agent performance evaluation. This approach evaluates the curriculum learning framework using MADDPG across two distinct coordination environments: achieving 56x performance improvement in tight coordination tasks (MultiWalker) and demonstrating systematic task progression in cooperative navigation (Simple Spread). Through systematic analysis, coordination tightness emerges as a predictor of curriculum learning effectiveness, where environments requiring strict agent interdependence benefit substantially from structured progression. This approach provides a validated complexity metric for multi-agent curriculum design and establishes empirical guidelines for multi-robot coordination applications.




The electrification of transportation and the increased adoption of decentralized renewable energy generation have added complexity to managing Renewable Energy Communities (RECs). Integrating Electric Vehicle (EV) charging with building energy systems like heating, ventilation, air conditioning (HVAC), photovoltaic (PV) generation, and battery storage presents significant opportunities but also practical challenges. Reinforcement learning (RL), particularly MultiAgent Deep Deterministic Policy Gradient (MADDPG) algorithms, have shown promising results in simulation, outperforming heuristic control strategies. However, translating these successes into real-world deployments faces substantial challenges, including incomplete and noisy data, integration of heterogeneous subsystems, synchronization issues, unpredictable occupant behavior, and missing critical EV state-of-charge (SoC) information. This paper introduces a framework designed explicitly to handle these complexities and bridge the simulation to-reality gap. The framework incorporates EnergAIze, a MADDPG-based multi-agent control strategy, and specifically addresses challenges related to real-world data collection, system integration, and user behavior modeling. Preliminary results collected from a real-world operational REC with four residential buildings demonstrate the practical feasibility of our approach, achieving an average 9% reduction in daily peak demand and a 5% decrease in energy costs through optimized load scheduling and EV charging behaviors. These outcomes underscore the framework's effectiveness, advancing the practical deployment of intelligent energy management solutions in RECs.
Artificial Intelligence (AI)-driven convolutional neural networks enhance rescue, inspection, and surveillance tasks performed by low-altitude uncrewed aerial vehicles (UAVs) and ground computing nodes (GCNs) in unknown environments. However, their high computational demands often exceed a single UAV's capacity, leading to system instability, further exacerbated by the limited and dynamic resources of GCNs. To address these challenges, this paper proposes a novel cooperation framework involving UAVs, ground-embedded robots (GERs), and high-altitude platforms (HAPs), which enable resource pooling through UAV-to-GER (U2G) and UAV-to-HAP (U2H) communications to provide computing services for UAV offloaded tasks. Specifically, we formulate the multi-objective optimization problem of task assignment and exploration optimization in UAVs as a dynamic long-term optimization problem. Our objective is to minimize task completion time and energy consumption while ensuring system stability over time. To achieve this, we first employ the Lyapunov optimization technique to transform the original problem, with stability constraints, into a per-slot deterministic problem. We then propose an algorithm named HG-MADDPG, which combines the Hungarian algorithm with a generative diffusion model (GDM)-based multi-agent deep deterministic policy gradient (MADDPG) approach. We first introduce the Hungarian algorithm as a method for exploration area selection, enhancing UAV efficiency in interacting with the environment. We then innovatively integrate the GDM and multi-agent deep deterministic policy gradient (MADDPG) to optimize task assignment decisions, such as task offloading and resource allocation. Simulation results demonstrate the effectiveness of the proposed approach, with significant improvements in task offloading efficiency, latency reduction, and system stability compared to baseline methods.




In this paper, we devise three actor-critic algorithms with decentralized training for multi-agent reinforcement learning in cooperative, adversarial, and mixed settings with continuous action spaces. To this goal, we adapt the MADDPG algorithm by applying a networked communication approach between agents. We introduce surrogate policies in order to decentralize the training while allowing for local communication during training. The decentralized algorithms achieve comparable results to the original MADDPG in empirical tests, while reducing computational cost. This is more pronounced with larger numbers of agents.




Although multi-tier vehicular Metaverse promises to transform vehicles into essential nodes -- within an interconnected digital ecosystem -- using efficient resource allocation and seamless vehicular twin (VT) migration, this can hardly be achieved by the existing techniques operating in a highly dynamic vehicular environment, since they can hardly balance multi-objective optimization problems such as latency reduction, resource utilization, and user experience (UX). To address these challenges, we introduce a novel multi-tier resource allocation and VT migration framework that integrates Graph Convolutional Networks (GCNs), a hierarchical Stackelberg game-based incentive mechanism, and Multi-Agent Deep Reinforcement Learning (MADRL). The GCN-based model captures both spatial and temporal dependencies within the vehicular network; the Stackelberg game-based incentive mechanism fosters cooperation between vehicles and infrastructure; and the MADRL algorithm jointly optimizes resource allocation and VT migration in real time. By modeling this dynamic and multi-tier vehicular Metaverse as a Markov Decision Process (MDP), we develop a MADRL-based algorithm dubbed the Multi-Objective Multi-Agent Deep Deterministic Policy Gradient (MO-MADDPG), which can effectively balances the various conflicting objectives. Extensive simulations validate the effectiveness of this algorithm that is demonstrated to enhance scalability, reliability, and efficiency while considerably improving latency, resource utilization, migration cost, and overall UX by 12.8%, 9.7%, 14.2%, and 16.1%, respectively.




Mobile edge computing (MEC) has empowered mobile devices (MDs) in supporting artificial intelligence (AI) applications through collaborative efforts with proximal MEC servers. Unfortunately, despite the great promise of device-edge cooperative AI inference, data privacy becomes an increasing concern. In this paper, we develop a privacy-aware multi-device cooperative edge inference system for classification tasks, which integrates a distributed bidding mechanism for the MEC server's computational resources. Intermediate feature compression is adopted as a principled approach to minimize data privacy leakage. To determine the bidding values and feature compression ratios in a distributed fashion, we formulate a decentralized partially observable Markov decision process (DEC-POMDP) model, for which, a multi-agent deep deterministic policy gradient (MADDPG)-based algorithm is developed. Simulation results demonstrate the effectiveness of the proposed algorithm in privacy-preserving cooperative edge inference. Specifically, given a sufficient level of data privacy protection, the proposed algorithm achieves 0.31-0.95% improvements in classification accuracy compared to the approach being agnostic to the wireless channel conditions. The performance is further enhanced by 1.54-1.67% by considering the difficulties of inference data.