Abstract:A networked aerial robot team (NART) comprises a group of agents (e.g., unmanned aerial vehicles (UAVs), ground control stations, etc.) interconnected by wireless links. Inter-agent connectivity, even if intermittent (i.e. sparse), enables data exchanges between agents and supports cooperative behaviours in several NART missions. It can benefit online decentralised decision-making and group resilience, particularly when prior knowledge is inaccurate or incomplete. These requirements can be accounted for in the offline mission planning stages to incentivise cooperative behaviours and improve mission efficiency during the NART deployment. This paper proposes a novel path planning tool for a Sparse, Aware, and Cooperative Networked Aerial Robot Team (SpArC-NART) in exploration missions. It simultaneously considers different levels of prior information regarding the environment, limited agent energy, sensing, and communication, as well as distinct NART constitutions. The communication model takes into account the limitations of user-defined radio technology and physical phenomena. The proposed tool aims to maximise the mission goals (e.g., finding one or multiple targets, covering the full area of the environment, etc.), while cooperating with other agents to reduce agent reporting times, increase their global situational awareness (e.g., their knowledge of the environment), and facilitate mission replanning, if required. The developed cooperation mechanism leverages soft-motion constraints and dynamic rewards based on the Value of Movement and the expected communication availability between the agents at each time step. A ground sensing coverage use case was chosen to illustrate the current capabilities of this tool.
Abstract:The exploration of unknown, Global Navigation Satellite System (GNSS) denied environments by an autonomous communication-aware and collaborative group of Unmanned Aerial Vehicles (UAVs) presents significant challenges in coordination, perception, and decentralized decision-making. This paper implements Multi-Agent Reinforcement Learning (MARL) to address these challenges in a 2D indoor environment, using high-fidelity game-engine simulations (Godot) and continuous action spaces. Policy training aims to achieve emergent collaborative behaviours and decision-making under uncertainty using Network-Distributed Partially Observable Markov Decision Processes (ND-POMDPs). Each UAV is equipped with a Light Detection and Ranging (LiDAR) sensor and can share data (sensor measurements and a local occupancy map) with neighbouring agents. Inter-agent communication constraints include limited range, bandwidth and latency. Extensive ablation studies evaluated MARL training paradigms, reward function, communication system, neural network (NN) architecture, memory mechanisms, and POMDP formulations. This work jointly addresses several key limitations in prior research, namely reliance on discrete actions, single-agent or centralized formulations, assumptions of a priori knowledge and permanent connectivity, inability to handle dynamic obstacles, short planning horizons and architectural complexity in Recurrent NNs/Transformers. Results show that the scalable training paradigm, combined with a simplified architecture, enables rapid autonomous exploration of an indoor area. The implementation of Curriculum-Learning (five increasingly complex levels) also enabled faster, more robust training. This combination of high-fidelity simulation, MARL formulation, and computational efficiency establishes a strong foundation for deploying learned cooperative strategies in physical robotic systems.