MIT
Abstract:The reliable deployment of deep reinforcement learning in real-world settings requires the ability to generalize across a variety of conditions, including both in-distribution scenarios seen during training as well as novel out-of-distribution scenarios. In this work, we present a framework for dynamics generalization in deep reinforcement learning that unifies these two distinct types of generalization within a single architecture. We introduce a robust adaptation module that provides a mechanism for identifying and reacting to both in-distribution and out-of-distribution environment dynamics, along with a joint training pipeline that combines the goals of in-distribution adaptation and out-of-distribution robustness. Our algorithm GRAM achieves strong generalization performance across in-distribution and out-of-distribution scenarios upon deployment, which we demonstrate on a variety of realistic simulated locomotion tasks with a quadruped robot.
Abstract:This paper presents Robust samplE-based coVarIance StEering (REVISE), a multi-query algorithm that generates robust belief roadmaps for dynamic systems navigating through spatially dependent disturbances modeled as a Gaussian random field. Our proposed method develops a novel robust sample-based covariance steering edge controller to safely steer a robot between state distributions, satisfying state constraints along the trajectory. Our proposed approach also incorporates an edge rewiring step into the belief roadmap construction process, which provably improves the coverage of the belief roadmap. When compared to state-of-the-art methods, REVISE improves median plan accuracy (as measured by Wasserstein distance between the actual and planned final state distribution) by 10x in multi-query planning and reduces median plan cost (as measured by the largest eigenvalue of the planned state covariance at the goal) by 2.5x in single-query planning for a 6DoF system. We will release our code at https://acl.mit.edu/REVISE/.
Abstract:Global localization is a fundamental capability required for long-term and drift-free robot navigation. However, current methods fail to relocalize when faced with significantly different viewpoints. We present ROMAN (Robust Object Map Alignment Anywhere), a robust global localization method capable of localizing in challenging and diverse environments based on creating and aligning maps of open-set and view-invariant objects. To address localization difficulties caused by feature-sparse or perceptually aliased environments, ROMAN formulates and solves a registration problem between object submaps using a unified graph-theoretic global data association approach that simultaneously accounts for object shape and semantic similarities and a prior on gravity direction. Through a set of challenging large-scale multi-robot or multi-session SLAM experiments in indoor, urban and unstructured/forested environments, we demonstrate that ROMAN achieves a maximum recall 36% higher than other object-based map alignment methods and an absolute trajectory error that is 37% lower than using visual features for loop closures. Our project page can be found at https://acl.mit.edu/ROMAN/.
Abstract:Marine robots must maintain precise control and ensure safety during tasks like ocean monitoring, even when encountering unpredictable disturbances that affect performance. Designing algorithms for uncrewed surface vehicles (USVs) requires accounting for these disturbances to control the vehicle and ensure it avoids obstacles. While adaptive control has addressed USV control challenges, real-world applications are limited, and certifying USV safety amidst unexpected disturbances remains difficult. To tackle control issues, we employ a model reference adaptive controller (MRAC) to stabilize the USV along a desired trajectory. For safety certification, we developed a reachability module with a moving horizon estimator (MHE) to estimate disturbances affecting the USV. This estimate is propagated through a forward reachable set calculation, predicting future states and enabling real-time safety certification. We tested our safe autonomy pipeline on a Clearpath Heron USV in the Charles River, near MIT. Our experiments demonstrated that the USV's MRAC controller and reachability module could adapt to disturbances like thruster failures and drag forces. The MRAC controller outperformed a PID baseline, showing a 45%-81% reduction in RMSE position error. Additionally, the reachability module provided real-time safety certification, ensuring the USV's safety. We further validated our pipeline's effectiveness in underway replenishment and canal scenarios, simulating relevant marine tasks.
Abstract:Self-supervised learning is a powerful approach for developing traversability models for off-road navigation, but these models often struggle with inputs unseen during training. Existing methods utilize techniques like evidential deep learning to quantify model uncertainty, helping to identify and avoid out-of-distribution terrain. However, always avoiding out-of-distribution terrain can be overly conservative, e.g., when novel terrain can be effectively analyzed using a physics-based model. To overcome this challenge, we introduce Physics-Informed Evidential Traversability (PIETRA), a self-supervised learning framework that integrates physics priors directly into the mathematical formulation of evidential neural networks and introduces physics knowledge implicitly through an uncertainty-aware, physics-informed training loss. Our evidential network seamlessly transitions between learned and physics-based predictions for out-of-distribution inputs. Additionally, the physics-informed loss regularizes the learned model, ensuring better alignment with the physics model. Extensive simulations and hardware experiments demonstrate that PIETRA improves both learning accuracy and navigation performance in environments with significant distribution shifts.
Abstract:In decentralized multiagent trajectory planners, agents need to communicate and exchange their positions to generate collision-free trajectories. However, due to localization errors/uncertainties, trajectory deconfliction can fail even if trajectories are perfectly shared between agents. To address this issue, we first present PARM and PARM*, perception-aware, decentralized, asynchronous multiagent trajectory planners that enable a team of agents to navigate uncertain environments while deconflicting trajectories and avoiding obstacles using perception information. PARM* differs from PARM as it is less conservative, using more computation to find closer-to-optimal solutions. While these methods achieve state-of-the-art performance, they suffer from high computational costs as they need to solve large optimization problems onboard, making it difficult for agents to replan at high rates. To overcome this challenge, we present our second key contribution, PRIMER, a learning-based planner trained with imitation learning (IL) using PARM* as the expert demonstrator. PRIMER leverages the low computational requirements at deployment of neural networks and achieves a computation speed up to 5500 times faster than optimization-based approaches.
Abstract:Knowing the locations of nearby moving objects is important for a mobile robot to operate safely in a dynamic environment. Dynamic object tracking performance can be improved if robots share observations of tracked objects with nearby team members in real-time. To share observations, a robot must make up-to-date estimates of the transformation from its coordinate frame to the frame of each neighbor, which can be challenging because of odometry drift. We present Multiple Object Tracking with Localization Error Elimination (MOTLEE), a complete system for a multi-robot team to accurately estimate frame transformations and collaboratively track dynamic objects. To accomplish this, robots use open-set image-segmentation methods to build object maps of their environment and then use our Temporally Consistent Alignment of Frames Filter (TCAFF) to align maps and estimate coordinate frame transformations without any initial knowledge of neighboring robot poses. We show that our method for aligning frames enables a team of four robots to collaboratively track six pedestrians with accuracy similar to that of a system with ground truth localization in a challenging hardware demonstration. The code and hardware dataset are available at https://github.com/mit-acl/motlee.
Abstract:Despite notable successes of Reinforcement Learning (RL), the prevalent use of an online learning paradigm prevents its widespread adoption, especially in hazardous or costly scenarios. Offline RL has emerged as an alternative solution, learning from pre-collected static datasets. However, this offline learning introduces a new challenge known as distributional shift, degrading the performance when the policy is evaluated on scenarios that are Out-Of-Distribution (OOD) from the training dataset. Most existing offline RL resolves this issue by regularizing policy learning within the information supported by the given dataset. However, such regularization overlooks the potential for high-reward regions that may exist beyond the dataset. This motivates exploring novel offline learning techniques that can make improvements beyond the data support without compromising policy performance, potentially by learning causation (cause-and-effect) instead of correlation from the dataset. In this paper, we propose the MOOD-CRL (Model-based Offline OOD-Adapting Causal RL) algorithm, which aims to address the challenge of extrapolation for offline policy training through causal inference instead of policy-regularizing methods. Specifically, Causal Normalizing Flow (CNF) is developed to learn the transition and reward functions for data generation and augmentation in offline policy evaluation and training. Based on the data-invariant, physics-based qualitative causal graph and the observational data, we develop a novel learning scheme for CNF to learn the quantitative structural causal model. As a result, CNF gains predictive and counterfactual reasoning capabilities for sequential decision-making tasks, revealing a high potential for OOD adaptation. Our CNF-based offline RL approach is validated through empirical evaluations, outperforming model-free and model-based methods by a significant margin.
Abstract:Traditional optimization-based planners, while effective, suffer from high computational costs, resulting in slow trajectory generation. A successful strategy to reduce computation time involves using Imitation Learning (IL) to develop fast neural network (NN) policies from those planners, which are treated as expert demonstrators. Although the resulting NN policies are effective at quickly generating trajectories similar to those from the expert, (1) their output does not explicitly account for dynamic feasibility, and (2) the policies do not accommodate changes in the constraints different from those used during training. To overcome these limitations, we propose Constraint-Guided Diffusion (CGD), a novel IL-based approach to trajectory planning. CGD leverages a hybrid learning/online optimization scheme that combines diffusion policies with a surrogate efficient optimization problem, enabling the generation of collision-free, dynamically feasible trajectories. The key ideas of CGD include dividing the original challenging optimization problem solved by the expert into two more manageable sub-problems: (a) efficiently finding collision-free paths, and (b) determining a dynamically-feasible time-parametrization for those paths to obtain a trajectory. Compared to conventional neural network architectures, we demonstrate through numerical evaluations significant improvements in performance and dynamic feasibility under scenarios with new constraints never encountered during training.
Abstract:The paper presents Maximal Covariance Backward Reachable Trees (MAXCOVAR BRT), which is a multi-query algorithm for planning of dynamic systems under stochastic motion uncertainty and constraints on the control input with explicit coverage guarantees. In contrast to existing roadmap-based probabilistic planning methods that sample belief nodes randomly and draw edges between them \cite{csbrm_tro2024}, under control constraints, the reachability of belief nodes needs to be explicitly established and is determined by checking the feasibility of a non-convex program. Moreover, there is no explicit consideration of coverage of the roadmap while adding nodes and edges during the construction procedure for the existing methods. Our contribution is a novel optimization formulation to add nodes and construct the corresponding edge controllers such that the generated roadmap results in provably maximal coverage under control constraints as compared to any other method of adding nodes and edges. We characterize formally the notion of coverage of a roadmap in this stochastic domain via introduction of the h-$\operatorname{BRS}$ (Backward Reachable Set of Distributions) of a tree of distributions under control constraints, and also support our method with extensive simulations on a 6 DoF model.