Cooperative multi-agent reinforcement learning (MARL) has been an increasingly important research topic in the last half-decade because of its great potential for real-world applications. Because of the curse of dimensionality, the popular "centralized training decentralized execution" framework requires a long time in training, yet still cannot converge efficiently. In this paper, we propose a general training framework, MARL-LNS, to algorithmically address these issues by training on alternating subsets of agents using existing deep MARL algorithms as low-level trainers, while not involving any additional parameters to be trained. Based on this framework, we provide three algorithm variants based on the framework: random large neighborhood search (RLNS), batch large neighborhood search (BLNS), and adaptive large neighborhood search (ALNS), which alternate the subsets of agents differently. We test our algorithms on both the StarCraft Multi-Agent Challenge and Google Research Football, showing that our algorithms can automatically reduce at least 10% of training time while reaching the same final skill level as the original algorithm.
With the explosive influence caused by the success of large language models (LLM) like ChatGPT and GPT-4, there has been an extensive amount of recent work showing that foundation models can be used to solve a large variety of tasks. However, there is very limited work that shares insights on multi-agent planning. Multi-agent planning is different from other domains by combining the difficulty of multi-agent coordination and planning, and making it hard to leverage external tools to facilitate the reasoning needed. In this paper, we focus on the problem of multi-agent path finding (MAPF), which is also known as multi-robot route planning, and study how to solve MAPF with LLMs. We first show the motivating success on an empty room map without obstacles, then the failure to plan on a slightly harder room map. We present our hypothesis of why directly solving MAPF with LLMs has not been successful yet, and we use various experiments to support our hypothesis.
Adaptive sampling and planning in robotic environmental monitoring are challenging when the target environmental process varies over space and time. The underlying environmental dynamics require the planning module to integrate future environmental changes so that action decisions made earlier do not quickly become outdated. We propose a Monte Carlo tree search method which not only well balances the environment exploration and exploitation in space, but also catches up to the temporal environmental dynamics. This is achieved by incorporating multi-objective optimization and a look-ahead model-predictive rewarding mechanism. We show that by allowing the robot to leverage the simulated and predicted spatiotemporal environmental process, the proposed informative planning approach achieves a superior performance after comparing with other baseline methods in terms of the root mean square error of the environment model and the distance to the ground truth.
In the autonomous ocean monitoring task, the sampling robot moves in the environment and accumulates data continuously. The widely adopted spatial modeling method - standard Gaussian process (GP) regression - becomes inadequate in processing the growing sensing data of a large size. To overcome the computational challenge, this paper presents an environmental modeling framework using a sparse variant of GP called streaming sparse GP (SSGP). The SSGP is able to handle streaming data in an online and incremental manner, and is therefore suitable for long-term autonomous environmental monitoring. The SSGP summarizes the collected data using a small set of pseudo data points that best represent the whole dataset, and updates the hyperparameters and pseudo point locations in a streaming fashion, leading to high-quality approximation of the underlying environmental model with significantly reduced computational cost and memory demand.
Robotic Information Gathering (RIG) is a foundational research topic that answers how a robot (team) collects informative data to efficiently build an accurate model of an unknown target function under robot embodiment constraints. RIG has many applications, including but not limited to autonomous exploration and mapping, 3D reconstruction or inspection, search and rescue, and environmental monitoring. A RIG system relies on a probabilistic model's prediction uncertainty to identify critical areas for informative data collection. Gaussian Processes (GPs) with stationary kernels have been widely adopted for spatial modeling. However, real-world spatial data is typically non-stationary -- different locations do not have the same degree of variability. As a result, the prediction uncertainty does not accurately reveal prediction error, limiting the success of RIG algorithms. We propose a family of non-stationary kernels named Attentive Kernel (AK), which is simple, robust, and can extend any existing kernel to a non-stationary one. We evaluate the new kernel in elevation mapping tasks, where AK provides better accuracy and uncertainty quantification over the commonly used stationary kernels and the leading non-stationary kernels. The improved uncertainty quantification guides the downstream informative planner to collect more valuable data around the high-error area, further increasing prediction accuracy. A field experiment demonstrates that the proposed method can guide an Autonomous Surface Vehicle (ASV) to prioritize data collection in locations with significant spatial variations, enabling the model to characterize salient environmental features.
The paper introduces DiSProD, an online planner developed for environments with probabilistic transitions in continuous state and action spaces. DiSProD builds a symbolic graph that captures the distribution of future trajectories, conditioned on a given policy, using independence assumptions and approximate propagation of distributions. The symbolic graph provides a differentiable representation of the policy's value, enabling efficient gradient-based optimization for long-horizon search. The propagation of approximate distributions can be seen as an aggregation of many trajectories, making it well-suited for dealing with sparse rewards and stochastic environments. An extensive experimental evaluation compares DiSProD to state-of-the-art planners in discrete-time planning and real-time control of robotic systems. The proposed method improves over existing planners in handling stochastic environments, sensitivity to search depth, sparsity of rewards, and large action spaces. Additional real-world experiments demonstrate that DiSProD can control ground vehicles and surface vessels to successfully navigate around obstacles.
Robotic Information Gathering (RIG) relies on the uncertainty of a probabilistic model to identify critical areas for efficient data collection. Gaussian processes (GPs) with stationary kernels have been widely adopted for spatial modeling. However, real-world spatial data typically does not satisfy the assumption of stationarity, where different locations are assumed to have the same degree of variability. As a result, the prediction uncertainty does not accurately capture prediction error, limiting the success of RIG algorithms. We propose a novel family of nonstationary kernels, named the Attentive Kernel (AK), which is simple, robust, and can extend any existing kernel to a nonstationary one. We evaluate the new kernel in elevation mapping tasks, where AK provides better accuracy and uncertainty quantification over the commonly used RBF kernel and other popular nonstationary kernels. The improved uncertainty quantification guides the downstream RIG planner to collect more valuable data around the high-error area, further increasing prediction accuracy. A field experiment demonstrates that the proposed method can guide an Autonomous Surface Vehicle (ASV) to prioritize data collection in locations with high spatial variations, enabling the model to characterize the salient environmental features.
Existing multi-agent perception systems assume that every agent utilizes the same models with identical parameters and architecture, which is often impractical in the real world. The significant performance boost brought by the multi-agent system can be degraded dramatically when the perception models are noticeably different. In this work, we propose a model-agnostic multi-agent framework to reduce the negative effect caused by model discrepancies and maintain confidentiality. Specifically, we consider the perception heterogeneity between agents by integrating a novel uncertainty calibrator which can eliminate the bias among agents' predicted confidence scores. Each agent performs such calibration independently on a standard public database, and therefore the intellectual property can be protected. To further refine the detection accuracy, we also propose a new algorithm called Promotion-Suppression Aggregation (PSA) that considers not only the confidence score of proposals but also the spatial agreement of their neighbors. Our experiments emphasize the necessity of model calibration across different agents, and the results show that our proposed approach outperforms the state-of-the-art baseline methods for 3D object detection on the open OPV2V dataset.
In many environmental monitoring scenarios, the sampling robot needs to simultaneously explore the environment and exploit features of interest with limited time. We present an anytime multi-objective informative planning method called Pareto Monte Carlo tree search which allows the robot to handle potentially competing objectives such as exploration versus exploitation. The method produces optimized decision solutions for the robot based on its knowledge (estimation) of the environment state, leading to better adaptation to environmental dynamics. We provide algorithmic analysis on the critical tree node selection step and show that the number of times choosing sub-optimal nodes is logarithmically bounded and the search result converges to the optimal choices at a polynomial rate.
Informative planning seeks a sequence of actions that guide the robot to collect the most informative data to map a large environment or learn a dynamical system. Existing work in informative planning mainly focus on proposing new planners, and applying them to various robotic applications such as environmental monitoring, autonomous exploration, and system identification. The informative planners optimize an objective given by a probabilistic model, e.g. Gaussian process regression. In practice, the model can be easily affected by the ubiquitous sensing outliers, resulting in a misleading objective. A straightforward solution is to filter out the outliers in the sensing data stream using an off-the-shelf outlier detector. However, informative samples are also scarce by definition, so they might be falsely filtered out. In this paper, we propose a method to enable the robot to re-visit the locations where outliers were sampled besides optimizing the informative planning objective. By doing so, the robot can collect more samples in the vicinity of outliers and update the outlier detector to reduce the number of false alarms. This is achieved by designing a new objective on top of a Pareto variant of Monte Carlo tree search. We demonstrate that the proposed framework achieves better performance than simply applying an outlier detector.