Solving partially-observable Markov decision processes (POMDPs) is critical when applying deep reinforcement learning (DRL) to real-world robotics problems, where agents have an incomplete view of the world. We present graph convolutional memory (GCM) for solving POMDPs using deep reinforcement learning. Unlike recurrent neural networks (RNNs) or transformers, GCM embeds domain-specific priors into the memory recall process via a knowledge graph. By encapsulating priors in the graph, GCM adapts to specific tasks but remains applicable to any DRL task. Using graph convolutions, GCM extracts hierarchical graph features, analogous to image features in a convolutional neural network (CNN). We show GCM outperforms long short-term memory (LSTM), gated transformers for reinforcement learning (GTrXL), and differentiable neural computers (DNCs) on control, long-term non-sequential recall, and 3D navigation tasks while using significantly fewer parameters.
In this paper, we develop a learning-based approach for decentralized submodular maximization. We focus on applications where robots are required to jointly select actions, e.g., motion primitives, to maximize team submodular objectives with local communications only. Such applications are essential for large-scale multi-robot coordination such as multi-robot motion planning for area coverage, environment exploration, and target tracking. But the current decentralized submodular maximization algorithms either require assumptions on the inter-robot communication or lose some suboptimal guarantees. In this work, we propose a general-purpose learning architecture towards submodular maximization at scale, with decentralized communications. Particularly, our learning architecture leverages a graph neural network (GNN) to capture local interactions of the robots and learns decentralized decision-making for the robots. We train the learning model by imitating an expert solution and implement the resulting model for decentralized action selection involving local observations and communications only. We demonstrate the performance of our GNN-based learning approach in a scenario of active target coverage with large networks of robots. The simulation results show our approach nearly matches the coverage performance of the expert algorithm, and yet runs several orders faster with more than 30 robots. The results also exhibit our approach's generalization capability in previously unseen scenarios, e.g., larger environments and larger networks of robots.
Recent work in the multi-agent domain has shown the promise of Graph Neural Networks (GNNs) to learn complex coordination strategies. However, most current approaches use minor variants of a Graph Convolutional Network (GCN), which applies a convolution to the communication graph formed by the multi-agent system. In this paper, we investigate whether the performance and generalization of GCNs can be improved upon. We introduce ModGNN, a decentralized framework which serves as a generalization of GCNs, providing more flexibility. To test our hypothesis, we evaluate an implementation of ModGNN against several baselines in the multi-agent flocking problem. We perform an ablation analysis to show that the most important component of our framework is one that does not exist in a GCN. By varying the number of agents, we also demonstrate that an application-agnostic implementation of ModGNN possesses an improved ability to generalize to new environments.
We consider a pursuit-evasion problem with a heterogeneous team of multiple pursuers and multiple evaders. Although both the pursuers (robots) and the evaders are aware of each others' control and assignment strategies, they do not have exact information about the other type of agents' location or action. Using only noisy on-board sensors the pursuers (or evaders) make probabilistic estimation of positions of the evaders (or pursuers). Each type of agent use Markov localization to update the probability distribution of the other type. A search-based control strategy is developed for the pursuers that intrinsically takes the probability distribution of the evaders into account. Pursuers are assigned using an assignment algorithm that takes redundancy (i.e., an excess in the number of pursuers than the number of evaders) into account, such that the total or maximum estimated time to capture the evaders is minimized. In this respect we assume the pursuers to have clear advantage over the evaders. However, the objective of this work is to use assignment strategies that minimize the capture time. This assignment strategy is based on a modified Hungarian algorithm as well as a novel algorithm for determining assignment of redundant pursuers. The evaders, in order to effectively avoid the pursuers, predict the assignment based on their probabilistic knowledge of the pursuers and use a control strategy to actively move away from those pursues. Our experimental evaluation shows that the redundant assignment algorithm performs better than an alternative nearest-neighbor based assignment algorithm.
We study the consideration of fairness in redundant assignment for multi-agent task allocation. It has recently been shown that redundant assignment of agents to tasks provides robustness to uncertainty in task performance. However, the question of how to fairly assign these redundant resources across tasks remains unaddressed. In this paper, we present a novel problem formulation for fair redundant task allocation, which we cast as the optimization of worst-case task costs under a cardinality constraint. Solving this problem optimally is NP-hard. We exploit properties of supermodularity to propose a polynomial-time, near-optimal solution. In supermodular redundant assignment, the use of additional agents always improves task costs. Therefore, we provide a solution set that is $\alpha$ times larger than the cardinality constraint. This constraint relaxation enables our approach to achieve a super-optimal cost by using a sub-optimal assignment size. We derive the sub-optimality bound on this cardinality relaxation, $\alpha$. Additionally, we demonstrate that our algorithm performs near-optimally without the cardinality relaxation. We show simulations of redundant assignments of robots to goal nodes on transport networks with uncertain travel times. Empirically, our algorithm outperforms benchmarks, scales to large problems, and provides improvements in both fairness and average utility.
Robotic simulators are crucial for academic research and education as well as the development of safety-critical applications. Reinforcement learning environments -- simple simulations coupled with a problem specification in the form of a reward function -- are also important to standardize the development (and benchmarking) of learning algorithms. Yet, full-scale simulators typically lack portability and parallelizability. Vice versa, many reinforcement learning environments trade-off realism for high sample throughputs in toy-like problems. While public data sets have greatly benefited deep learning and computer vision, we still lack the software tools to simultaneously develop -- and fairly compare -- control theory and reinforcement learning approaches. In this paper, we propose an open-source OpenAI Gym-like environment for multiple quadcopters based on the Bullet physics engine. Its multi-agent and vision based reinforcement learning interfaces, as well as the support of realistic collisions and aerodynamic effects, make it, to the best of our knowledge, a first of its kind. We demonstrate its use through several examples, either for control (trajectory tracking with PID control, multi-robot flight with downwash, etc.) or reinforcement learning (single and multi-agent stabilization tasks), hoping to inspire future research that combines control theory and machine learning.
Dynamical systems consisting of a set of autonomous agents face the challenge of having to accomplish a global task, relying only on local information. While centralized controllers are readily available, they face limitations in terms of scalability and implementation, as they do not respect the distributed information structure imposed by the network system of agents. Given the difficulties in finding optimal decentralized controllers, we propose a novel framework using graph neural networks (GNNs) to learn these controllers. GNNs are well-suited for the task since they are naturally distributed architectures and exhibit good scalability and transferability properties. The problems of flocking and multi-agent path planning are explored to illustrate the potential of GNNs in learning decentralized controllers.
In this paper, we consider the problem of providing robustness to adversarial communication in multi-agent systems. Specifically, we propose a solution towards robust cooperation, which enables the multi-agent system to maintain high performance in the presence of anonymous non-cooperative agents that communicate faulty, misleading or manipulative information. In pursuit of this goal, we propose a communication architecture based on Graph Neural Networks (GNNs), which is amenable to a novel Gaussian Process (GP)-based probabilistic model characterizing the mutual information between the simultaneous communications of different agents due to their physical proximity and relative position. This model allows agents to locally compute approximate posterior probabilities, or confidences, that any given one of their communication partners is being truthful. These confidences can be used as weights in a message filtering scheme, thereby suppressing the influence of suspicious communication on the receiving agent's decisions. In order to assess the efficacy of our method, we introduce a taxonomy of non-cooperative agents, which distinguishes them by the amount of information available to them. We demonstrate in two distinct experiments that our method performs well across this taxonomy, outperforming alternative methods. For all but the best informed adversaries, our filtering method is able to reduce the impact that non-cooperative agents cause, reducing it to the point of negligibility, and with negligible cost to performance in the absence of adversaries.