Alert button
Picture for Xiaolong Zhu

Xiaolong Zhu

Alert button

Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO

Aug 30, 2023
Yangkun Chen, Joseph Suarez, Junjie Zhang, Chenghui Yu, Bo Wu, Hanmo Chen, Hengman Zhu, Rui Du, Shanliang Qian, Shuai Liu, Weijun Hong, Jinke He, Yibing Zhang, Liang Zhao, Clare Zhu, Julian Togelius, Sharada Mohanty, Jiaxin Chen, Xiu Li, Xiaolong Zhu, Phillip Isola

Figure 1 for Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO
Figure 2 for Benchmarking Robustness and Generalization in Multi-Agent Systems: A Case Study on Neural MMO

We present the results of the second Neural MMO challenge, hosted at IJCAI 2022, which received 1600+ submissions. This competition targets robustness and generalization in multi-agent systems: participants train teams of agents to complete a multi-task objective against opponents not seen during training. The competition combines relatively complex environment design with large numbers of agents in the environment. The top submissions demonstrate strong success on this task using mostly standard reinforcement learning (RL) methods combined with domain-specific engineering. We summarize the competition design and results and suggest that, as an academic community, competitions may be a powerful approach to solving hard problems and establishing a solid benchmark for algorithms. We will open-source our benchmark including the environment wrapper, baselines, a visualization tool, and selected policies for further research.

Viaarxiv icon

Emergent collective intelligence from massive-agent cooperation and competition

Jan 05, 2023
Hanmo Chen, Stone Tao, Jiaxin Chen, Weihan Shen, Xihui Li, Chenghui Yu, Sikai Cheng, Xiaolong Zhu, Xiu Li

Figure 1 for Emergent collective intelligence from massive-agent cooperation and competition
Figure 2 for Emergent collective intelligence from massive-agent cooperation and competition
Figure 3 for Emergent collective intelligence from massive-agent cooperation and competition
Figure 4 for Emergent collective intelligence from massive-agent cooperation and competition

Inspired by organisms evolving through cooperation and competition between different populations on Earth, we study the emergence of artificial collective intelligence through massive-agent reinforcement learning. To this end, We propose a new massive-agent reinforcement learning environment, Lux, where dynamic and massive agents in two teams scramble for limited resources and fight off the darkness. In Lux, we build our agents through the standard reinforcement learning algorithm in curriculum learning phases and leverage centralized control via a pixel-to-pixel policy network. As agents co-evolve through self-play, we observe several stages of intelligence, from the acquisition of atomic skills to the development of group strategies. Since these learned group strategies arise from individual decisions without an explicit coordination mechanism, we claim that artificial collective intelligence emerges from massive-agent cooperation and competition. We further analyze the emergence of various learned strategies through metrics and ablation studies, aiming to provide insights for reinforcement learning implementations in massive-agent environments.

* Published at NeurIPS 2022 Deep RL workshop. Code available at https://github.com/hanmochen/lux-open 
Viaarxiv icon

Multi-Agent Path Finding via Tree LSTM

Oct 24, 2022
Yuhao Jiang, Kunjie Zhang, Qimai Li, Jiaxin Chen, Xiaolong Zhu

Figure 1 for Multi-Agent Path Finding via Tree LSTM
Figure 2 for Multi-Agent Path Finding via Tree LSTM
Figure 3 for Multi-Agent Path Finding via Tree LSTM
Figure 4 for Multi-Agent Path Finding via Tree LSTM

In recent years, Multi-Agent Path Finding (MAPF) has attracted attention from the fields of both Operations Research (OR) and Reinforcement Learning (RL). However, in the 2021 Flatland3 Challenge, a competition on MAPF, the best RL method scored only 27.9, far less than the best OR method. This paper proposes a new RL solution to Flatland3 Challenge, which scores 125.3, several times higher than the best RL solution before. We creatively apply a novel network architecture, TreeLSTM, to MAPF in our solution. Together with several other RL techniques, including reward shaping, multiple-phase training, and centralized control, our solution is comparable to the top 2-3 OR methods.

* In submission to AAAI23-MAPF 
Viaarxiv icon

A Multi-UAV System for Exploration and Target Finding in Cluttered and GPS-Denied Environments

Jul 19, 2021
Xiaolong Zhu, Fernando Vanegas, Felipe Gonzalez, Conrad Sanderson

Figure 1 for A Multi-UAV System for Exploration and Target Finding in Cluttered and GPS-Denied Environments
Figure 2 for A Multi-UAV System for Exploration and Target Finding in Cluttered and GPS-Denied Environments
Figure 3 for A Multi-UAV System for Exploration and Target Finding in Cluttered and GPS-Denied Environments
Figure 4 for A Multi-UAV System for Exploration and Target Finding in Cluttered and GPS-Denied Environments

The use of multi-rotor Unmanned Aerial Vehicles (UAVs) for search and rescue as well as remote sensing is rapidly increasing. Multi-rotor UAVs, however, have limited endurance. The range of UAV applications can be widened if teams of multiple UAVs are used. We propose a framework for a team of UAVs to cooperatively explore and find a target in complex GPS-denied environments with obstacles. The team of UAVs autonomously navigates, explores, detects, and finds the target in a cluttered environment with a known map. Examples of such environments include indoor scenarios, urban or natural canyons, caves, and tunnels, where the GPS signal is limited or blocked. The framework is based on a probabilistic decentralised Partially Observable Markov Decision Process which accounts for the uncertainties in sensing and the environment. The team can cooperate efficiently, with each UAV sharing only limited processed observations and their locations during the mission. The system is simulated using the Robotic Operating System and Gazebo. Performance of the system with an increasing number of UAVs in several indoor scenarios with obstacles is tested. Results indicate that the proposed multi-UAV system has improvements in terms of time-cost, the proportion of search area surveyed, as well as successful rates for search and rescue missions.

Viaarxiv icon