One of the preeminent obstacles to scaling multi-agent reinforcement learning to large numbers of agents is assigning credit to individual agents' actions. In this paper, we address this credit assignment problem with an approach that we call \textit{partial reward decoupling} (PRD), which attempts to decompose large cooperative multi-agent RL problems into decoupled subproblems involving subsets of agents, thereby simplifying credit assignment. We empirically demonstrate that decomposing the RL problem using PRD in an actor-critic algorithm results in lower variance policy gradient estimates, which improves data efficiency, learning stability, and asymptotic performance across a wide array of multi-agent RL tasks, compared to various other actor-critic approaches. Additionally, we relate our approach to counterfactual multi-agent policy gradient (COMA), a state-of-the-art MARL algorithm, and empirically show that our approach outperforms COMA by making better use of information in agents' reward streams, and by enabling recent advances in advantage estimation to be used.
Serially connected robots are promising candidates for performing tasks in confined spaces such as search-and-rescue in large-scale disasters. Such robots are typically limbless, and we hypothesize that the addition of limbs could improve mobility. However, a challenge in designing and controlling such devices lies in the coordination of high-dimensional redundant modules in a way that improves mobility. Here we develop a general framework to control serially connected multi-legged robots. Specifically, we combine two approaches to build a general shape control scheme which can provide baseline patterns of self-deformation ("gaits") for effective locomotion in diverse robot morphologies. First, we take inspiration from a dimensionality reduction and a biological gait classification scheme to generate cyclic patterns of body deformation and foot lifting/lowering, which facilitate generation of arbitrary substrate contact patterns. Second, we use geometric mechanics methods to facilitates identification of optimal phasing of these undulations to maximize speed and/or stability. Our scheme allows the development of effective gaits in multi-legged robots locomoting on flat frictional terrain with diverse number of limbs (4, 6, 16, and even 0 limbs) and body actuation capabilities (including sidewinding gaits on limbless devices). By properly coordinating the body undulation and the leg placement, our framework combines the advantages of both limbless robots (modularity) and legged robots (mobility). We expect that our framework can provide general control schemes for the rapid deployment of general multi-legged robots, paving the ways toward machines that can traverse complex environments under real-life conditions.
Autonomous Exploration Development Environment is an open-source repository released to facilitate the development of high-level planning algorithms and integration of complete autonomous navigation systems. The repository contains representative simulation environment models, fundamental navigation modules, e.g., local planner, terrain traversability analysis, waypoint following, and visualization tools. Together with two of our high-level planner releases -- TARE planner for exploration and FAR planner for route planning, we detail usage of the three open-source repositories and share experiences in the integration of autonomous navigation systems. We use DARPA Subterranean Challenge as a use case where the repositories together form the main navigation system of the CMU-OSU Team. In the end, we discuss a few potential use cases in extended applications.
Multi-Agent Path Finding (MAPF) finds conflict-free paths for multiple agents from their respective start to goal locations. MAPF is challenging as the joint configuration space grows exponentially with respect to the number of agents. Among MAPF planners, search-based methods, such as CBS and M*, effectively bypass the curse of dimensionality by employing a dynamically-coupled strategy: agents are planned in a fully decoupled manner at first, where potential conflicts between agents are ignored; and then agents either follow their individual plans or are coupled together for planning to resolve the conflicts between them. In general, the number of conflicts to be resolved decides the run time of these planners and most of the existing work focuses on how to efficiently resolve these conflicts. In this work, we take a different view and aim to reduce the number of conflicts (and thus improve the overall search efficiency) by improving each agent's individual plan. By leveraging a Visual Transformer, we develop a learning-based single-agent planner, which plans for a single agent while paying attention to both the structure of the map and other agents with whom conflicts may happen. We then develop a novel multi-agent planner called LM* by integrating this learning-based single-agent planner with M*. Our results show that for both "seen" and "unseen" maps, in comparison with M*, LM* has fewer conflicts to be resolved and thus, runs faster and enjoys higher success rates. We empirically show that MAPF solutions computed by LM* are near-optimal. Our code is available at https://github.com/lakshayvirmani/learning-assisted-mstar .
Robots often interact with the world via attached parts such as wheels, joints, or appendages. In many systems, these interactions, and the manner in which they lead to locomotion, can be understood using the machinery of geometric mechanics, explaining how inputs in the shape space of a robot affect motion in its configuration space and the configuration space of its environment. In this paper we consider an opposite type of locomotion, wherein robots are influenced actively by interactions with an externally forced ambient medium. We investigate two examples of externally actuated systems; one for which locomotion is governed by a principal connection, and is usually considered to possess no drift dynamics, and another for which no such connection exists, with drift inherent in its locomotion. For the driftless system, we develop geometric tools based on previously understood internally actuated versions of the system and demonstrate their use for motion planning under external actuation. For the system possessing drift, we employ nonholonomic reduction to obtain a reduced representation of the system dynamics, illustrate geometric features conducive to studying locomotion, and derive strategies for external actuation.
This paper addresses a generalization of the well known multi-agent path finding (MAPF) problem that optimizes multiple conflicting objectives simultaneously such as travel time and path risk. This generalization, referred to as multi-objective MAPF (MOMAPF), arises in several applications ranging from hazardous material transportation to construction site planning. In this paper, we present a new multi-objective conflict-based search (MO-CBS) approach that relies on a novel multi-objective safe interval path planning (MO-SIPP) algorithm for its low-level search. We first develop the MO-SIPP algorithm, show its properties and then embed it in MO-CBS. We present extensive numerical results to show that (1) there is an order of magnitude improvement in the average low level search time, and (2) a significant improvement in the success rates of finding the Pareto-optimal front can be obtained using the proposed approach in comparison with the state of the art. Finally, we also provide a case study to demonstrate the potential application of the proposed algorithms for construction site planning.
Incremental graph search algorithms, such as D* Lite, reuse previous search efforts to speed up subsequent similar path planning tasks. These algorithms have demonstrated their efficiency in comparison with search from scratch, and have been leveraged in many applications such as navigation in unknown terrain. On the other hand, path planning typically involves optimizing multiple conflicting objectives simultaneously, such as travel risk, arrival time, etc. Multi-objective path planning is challenging as the number of "Pareto-optimal" solutions can grow exponentially with respect to the size of the graph, which makes it computationally burdensome to plan from scratch each time when similar planning tasks needs to be solved. This article presents a new multi-objective incremental search algorithm called Multi-Objective Path-Based D* Lite (MOPBD*) which reuses previous search efforts to speed up subsequent planning tasks while optimizing multiple objectives. Numerical results show that MOPBD* is more efficient than search from scratch and runs an order of magnitude faster than existing incremental method for multi-objective path planning.
Advanced resuscitative technologies, such as Extra Corporeal Membrane Oxygenation (ECMO) cannulation or Resuscitative Endovascular Balloon Occlusion of the Aorta (REBOA), are technically difficult even for skilled medical personnel. This paper describes the core technologies that comprise a teleoperated system capable of granting femoral vascular access, which is an important step in both of these procedures and a major roadblock in their wider use in the field. These technologies include a kinematic manipulator, various sensing modalities, and a user interface. In addition, we evaluate our system on a surgical phantom as well as in-vivo porcine experiments. These resulted in, to the best of our knowledge, the first robot-assisted arterial catheterizations; a major step towards our eventual goal of automatic catheter insertion through the Seldinger technique.
One of the main obstacles to 3D semantic segmentation is the significant amount of endeavor required to generate expensive point-wise annotations for fully supervised training. To alleviate manual efforts, we propose GIDSeg, a novel approach that can simultaneously learn segmentation from sparse annotations via reasoning global-regional structures and individual-vicinal properties. GIDSeg depicts global- and individual- relation via a dynamic edge convolution network coupled with a kernelized identity descriptor. The ensemble effects are obtained by endowing a fine-grained receptive field to a low-resolution voxelized map. In our GIDSeg, an adversarial learning module is also designed to further enhance the conditional constraint of identity descriptors within the joint feature distribution. Despite the apparent simplicity, our proposed approach achieves superior performance over state-of-the-art for inferencing 3D dense segmentation with only sparse annotations. Particularly, with $5\%$ annotations of raw data, GIDSeg outperforms other 3D segmentation methods.
We present a method for localizing a single camera with respect to a point cloud map in indoor and outdoor scenes. The problem is challenging because correspondences of local invariant features are inconsistent across the domains between image and 3D. The problem is even more challenging as the method must handle various environmental conditions such as illumination, weather, and seasonal changes. Our method can match equirectangular images to the 3D range projections by extracting cross-domain symmetric place descriptors. Our key insight is to retain condition-invariant 3D geometry features from limited data samples while eliminating the condition-related features by a designed Generative Adversarial Network. Based on such features, we further design a spherical convolution network to learn viewpoint-invariant symmetric place descriptors. We evaluate our method on extensive self-collected datasets, which involve \textit{Long-term} (variant appearance conditions), \textit{Large-scale} (up to $2km$ structure/unstructured environment), and \textit{Multistory} (four-floor confined space). Our method surpasses other current state-of-the-arts by achieving around $3$ times higher place retrievals to inconsistent environments, and above $3$ times accuracy on online localization. To highlight our method's generalization capabilities, we also evaluate the recognition across different datasets. With a single trained model, i3dLoc can demonstrate reliable visual localization in random conditions.