Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Off-Policy Multi-Agent Decomposed Policy Gradients

Jul 24, 2020

Yihan Wang, Beining Han, Tonghan Wang, Heng Dong, Chongjie Zhang

Figure 1 for Off-Policy Multi-Agent Decomposed Policy Gradients

Figure 2 for Off-Policy Multi-Agent Decomposed Policy Gradients

Figure 3 for Off-Policy Multi-Agent Decomposed Policy Gradients

Share this with someone who'll enjoy it:

Abstract:Recently, multi-agent policy gradient (MAPG) methods witness vigorous progress. However, there is a discrepancy between the performance of MAPG methods and state-of-the-art multi-agent value-based approaches. In this paper, we investigate the causes that hinder the performance of MAPG algorithms and present a multi-agent decomposed policy gradient method (DOP). This method introduces the idea of value function decomposition into the multi-agent actor-critic framework. Based on this idea, DOP supports efficient off-policy learning and addresses the issue of centralized-decentralized mismatch and credit assignment in both discrete and continuous action spaces. We formally show that DOP critics have sufficient representational capability to guarantee convergence. In addition, empirical evaluations on the StarCraft II micromanagement benchmark and multi-agent particle environments demonstrate that our method significantly outperforms state-of-the-art value-based and policy-based multi-agent reinforcement learning algorithms. Demonstrative videos are available at https://sites.google.com/view/dop-mapg.

View paper on

Share this with someone who'll enjoy it:

Title:Off-Policy Multi-Agent Decomposed Policy Gradients

Paper and Code