Abstract:Recently, deep multi-agent reinforcement learning (MARL) has demonstrated promising performance for solving challenging tasks, such as long-term dependencies and non-Markovian environments. Its success is partly attributed to conditioning policies on large fixed context length. However, such large fixed context lengths may lead to limited exploration efficiency and redundant information. In this paper, we propose a novel MARL framework to obtain adaptive and effective contextual information. Specifically, we design a central agent that dynamically optimizes context length via temporal gradient analysis, enhancing exploration to facilitate convergence to global optima in MARL. Furthermore, to enhance the adaptive optimization capability of the context length, we present an efficient input representation for the central agent, which effectively filters redundant information. By leveraging a Fourier-based low-frequency truncation method, we extract global temporal trends across decentralized agents, providing an effective and efficient representation of the MARL environment. Extensive experiments demonstrate that the proposed method achieves state-of-the-art (SOTA) performance on long-term dependency tasks, including PettingZoo, MiniGrid, Google Research Football (GRF), and StarCraft Multi-Agent Challenge v2 (SMACv2).
Abstract:Adaptive Traffic Signal Control (ATSC) system is a critical component of intelligent transportation, with the capability to significantly alleviate urban traffic congestion. Although reinforcement learning (RL)-based methods have demonstrated promising performance in achieving ATSC, existing methods find it prone to convergence to local optima. Consequently, this paper proposes a novel Bayesian Critique-Tune-Based Reinforcement Learning with Attention-Based Adaptive Pressure (BCT-APRL) for multi-intersection signal control. In BCT-APRL, the Critique-Tune (CT) framework, a two-layer Bayesian structure is designed to refine the RL policies. Specifically, the Bayesian inference-based Critique layer provides effective evaluations of the credibility of policies; the Bayesian decision-based Tune layer fine-tunes policies by minimizing the posterior risks when the evaluations are negative. Furthermore, an attention-based Adaptive Pressure (AP) is designed to specify the traffic movement representation as an effective and efficient pressure of vehicle queues in the traffic network. Achieving enhances the reasonableness of RL policies. Extensive experiments conducted with a simulator across a range of intersection layouts show that BCT-APRL is superior to other state-of-the-art methods in seven real-world datasets. Codes are open-sourced.