Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Luca Carlone

Hierarchical Representations and Explicit Memory: Learning Effective Navigation Policies on 3D Scene Graphs using Graph Neural Networks

Aug 02, 2021

Zachary Ravichandran, Lisa Peng, Nathan Hughes, J. Daniel Griffith, Luca Carlone

Figure 1 for Hierarchical Representations and Explicit Memory: Learning Effective Navigation Policies on 3D Scene Graphs using Graph Neural Networks

Figure 2 for Hierarchical Representations and Explicit Memory: Learning Effective Navigation Policies on 3D Scene Graphs using Graph Neural Networks

Figure 3 for Hierarchical Representations and Explicit Memory: Learning Effective Navigation Policies on 3D Scene Graphs using Graph Neural Networks

Figure 4 for Hierarchical Representations and Explicit Memory: Learning Effective Navigation Policies on 3D Scene Graphs using Graph Neural Networks

Abstract:Representations are crucial for a robot to learn effective navigation policies. Recent work has shown that mid-level perceptual abstractions, such as depth estimates or 2D semantic segmentation, lead to more effective policies when provided as observations in place of raw sensor data (e.g., RGB images). However, such policies must still learn latent three-dimensional scene properties from mid-level abstractions. In contrast, high-level, hierarchical representations such as 3D scene graphs explicitly provide a scene's geometry, topology, and semantics, making them compelling representations for navigation. In this work, we present a reinforcement learning framework that leverages high-level hierarchical representations to learn navigation policies. Towards this goal, we propose a graph neural network architecture and show how to embed a 3D scene graph into an agent-centric feature space, which enables the robot to learn policies for low-level action in an end-to-end manner. For each node in the scene graph, our method uses features that capture occupancy and semantic content, while explicitly retaining memory of the robot trajectory. We demonstrate the effectiveness of our method against commonly used visuomotor policies in a challenging object search task. These experiments and supporting ablation studies show that our method leads to more effective object search behaviors, exhibits improved long-term memory, and successfully leverages hierarchical information to guide its navigation objectives.

Via

Access Paper or Ask Questions

Kimera-Multi: Robust, Distributed, Dense Metric-Semantic SLAM for Multi-Robot Systems

Jun 28, 2021

Yulun Tian, Yun Chang, Fernando Herrera Arias, Carlos Nieto-Granda, Jonathan P. How, Luca Carlone

Figure 1 for Kimera-Multi: Robust, Distributed, Dense Metric-Semantic SLAM for Multi-Robot Systems

Figure 2 for Kimera-Multi: Robust, Distributed, Dense Metric-Semantic SLAM for Multi-Robot Systems

Figure 3 for Kimera-Multi: Robust, Distributed, Dense Metric-Semantic SLAM for Multi-Robot Systems

Figure 4 for Kimera-Multi: Robust, Distributed, Dense Metric-Semantic SLAM for Multi-Robot Systems

Abstract:This paper presents Kimera-Multi, the first multi-robot system that (i) is robust and capable of identifying and rejecting incorrect inter and intra-robot loop closures resulting from perceptual aliasing, (ii) is fully distributed and only relies on local (peer-to-peer) communication to achieve distributed localization and mapping, and (iii) builds a globally consistent metric-semantic 3D mesh model of the environment in real-time, where faces of the mesh are annotated with semantic labels. Kimera-Multi is implemented by a team of robots equipped with visual-inertial sensors. Each robot builds a local trajectory estimate and a local mesh using Kimera. When communication is available, robots initiate a distributed place recognition and robust pose graph optimization protocol based on a novel distributed graduated non-convexity algorithm. The proposed protocol allows the robots to improve their local trajectory estimates by leveraging inter-robot loop closures while being robust to outliers. Finally, each robot uses its improved trajectory estimate to correct the local mesh using mesh deformation techniques. We demonstrate Kimera-Multi in photo-realistic simulations, SLAM benchmarking datasets, and challenging outdoor datasets collected using ground robots. Both real and simulated experiments involve long trajectories (e.g., up to 800 meters per robot). The experiments show that Kimera-Multi (i) outperforms the state of the art in terms of robustness and accuracy, (ii) achieves estimation errors comparable to a centralized SLAM system while being fully distributed, (iii) is parsimonious in terms of communication bandwidth, (iv) produces accurate metric-semantic 3D meshes, and (v) is modular and can be also used for standard 3D reconstruction (i.e., without semantic labels) or for trajectory estimation (i.e., without reconstructing a 3D mesh).

* 18 pages, 15 figures

Via

Access Paper or Ask Questions

STRIDE along Spectrahedral Vertices for Solving Large-Scale Rank-One Semidefinite Relaxations

May 28, 2021

Heng Yang, Ling Liang, Kim-Chuan Toh, Luca Carlone

Figure 1 for STRIDE along Spectrahedral Vertices for Solving Large-Scale Rank-One Semidefinite Relaxations

Figure 2 for STRIDE along Spectrahedral Vertices for Solving Large-Scale Rank-One Semidefinite Relaxations

Figure 3 for STRIDE along Spectrahedral Vertices for Solving Large-Scale Rank-One Semidefinite Relaxations

Figure 4 for STRIDE along Spectrahedral Vertices for Solving Large-Scale Rank-One Semidefinite Relaxations

Abstract:We consider solving high-order semidefinite programming (SDP) relaxations of nonconvex polynomial optimization problems (POPs) that admit rank-one optimal solutions. Existing approaches, which solve the SDP independently from the POP, either cannot scale to large problems or suffer from slow convergence due to the typical degeneracy of such SDPs. We propose a new algorithmic framework, called SpecTrahedral pRoximal gradIent Descent along vErtices (STRIDE), that blends fast local search on the nonconvex POP with global descent on the convex SDP. Specifically, STRIDE follows a globally convergent trajectory driven by a proximal gradient method (PGM) for solving the SDP, while simultaneously probing long, but safeguarded, rank-one "strides", generated by fast nonlinear programming algorithms on the POP, to seek rapid descent. We prove STRIDE has global convergence. To solve the subproblem of projecting a given point onto the feasible set of the SDP, we reformulate the projection step as a continuously differentiable unconstrained optimization and apply a limited-memory BFGS method to achieve both scalability and accuracy. We conduct numerical experiments on solving second-order SDP relaxations arising from two important applications in machine learning and computer vision. STRIDE dominates a diverse set of five existing SDP solvers and is the only solver that can solve degenerate rank-one SDPs to high accuracy (e.g., KKT residuals below 1e-9), even in the presence of millions of equality constraints.

* 9 pages main context, 2 figures

Via

Access Paper or Ask Questions

Neural Trees for Learning on Graphs

May 15, 2021

Rajat Talak, Siyi Hu, Lisa Peng, Luca Carlone

Figure 1 for Neural Trees for Learning on Graphs

Figure 2 for Neural Trees for Learning on Graphs

Figure 3 for Neural Trees for Learning on Graphs

Figure 4 for Neural Trees for Learning on Graphs

Abstract:Graph Neural Networks (GNNs) have emerged as a flexible and powerful approach for learning over graphs. Despite this success, existing GNNs are constrained by their local message-passing architecture and are provably limited in their expressive power. In this work, we propose a new GNN architecture -- the Neural Tree. The neural tree architecture does not perform message passing on the input graph but on a tree-structured graph, called the H-tree, that is constructed from the input graph. Nodes in the H-tree correspond to subgraphs in the input graph, and they are reorganized in a hierarchical manner such that a parent-node of a node in the H-tree always corresponds to a larger subgraph in the input graph. We show that the neural tree architecture can approximate any smooth probability distribution function over an undirected graph, as well as emulate the junction tree algorithm. We also prove that the number of parameters needed to achieve an $\epsilon$-approximation of the distribution function is exponential in the treewidth of the input graph, but linear in its size. We apply the neural tree to semi-supervised node classification in 3D scene graphs, and show that these theoretical properties translate into significant gains in prediction accuracy, over the more traditional GNN architectures.

Via

Access Paper or Ask Questions

Optimal Pose and Shape Estimation for Category-level 3D Object Perception

May 12, 2021

Jingnan Shi, Heng Yang, Luca Carlone

Figure 1 for Optimal Pose and Shape Estimation for Category-level 3D Object Perception

Figure 2 for Optimal Pose and Shape Estimation for Category-level 3D Object Perception

Figure 3 for Optimal Pose and Shape Estimation for Category-level 3D Object Perception

Figure 4 for Optimal Pose and Shape Estimation for Category-level 3D Object Perception

Abstract:We consider a category-level perception problem, where one is given 3D sensor data picturing an object of a given category (e.g. a car), and has to reconstruct the pose and shape of the object despite intra-class variability (i.e. different car models have different shapes). We consider an active shape model, where -- for an object category -- we are given a library of potential CAD models describing objects in that category, and we adopt a standard formulation where pose and shape estimation are formulated as a non-convex optimization. Our first contribution is to provide the first certifiably optimal solver for pose and shape estimation. In particular, we show that rotation estimation can be decoupled from the estimation of the object translation and shape, and we demonstrate that (i) the optimal object rotation can be computed via a tight (small-size) semidefinite relaxation, and (ii) the translation and shape parameters can be computed in closed-form given the rotation. Our second contribution is to add an outlier rejection layer to our solver, hence making it robust to a large number of misdetections. Towards this goal, we wrap our optimal solver in a robust estimation scheme based on graduated non-convexity. To further enhance robustness to outliers, we also develop the first graph-theoretic formulation to prune outliers in category-level perception, which removes outliers via convex hull and maximum clique computations; the resulting approach is robust to 70%-90% outliers. Our third contribution is an extensive experimental evaluation. Besides providing an ablation study on a simulated dataset and on the PASCAL3D+ dataset, we combine our solver with a deep-learned keypoint detector, and show that the resulting approach improves over the state of the art in vehicle pose estimation in the ApolloScape datasets.

Via

Access Paper or Ask Questions

NeBula: Quest for Robotic Autonomy in Challenging Environments; TEAM CoSTAR at the DARPA Subterranean Challenge

Mar 28, 2021

Ali Agha, Kyohei Otsu, Benjamin Morrell, David D. Fan, Rohan Thakker, Angel Santamaria-Navarro, Sung-Kyun Kim, Amanda Bouman, Xianmei Lei, Jeffrey Edlund(+62 more)

Figure 1 for NeBula: Quest for Robotic Autonomy in Challenging Environments; TEAM CoSTAR at the DARPA Subterranean Challenge

Figure 2 for NeBula: Quest for Robotic Autonomy in Challenging Environments; TEAM CoSTAR at the DARPA Subterranean Challenge

Figure 3 for NeBula: Quest for Robotic Autonomy in Challenging Environments; TEAM CoSTAR at the DARPA Subterranean Challenge

Figure 4 for NeBula: Quest for Robotic Autonomy in Challenging Environments; TEAM CoSTAR at the DARPA Subterranean Challenge

Abstract:This paper presents and discusses algorithms, hardware, and software architecture developed by the TEAM CoSTAR (Collaborative SubTerranean Autonomous Robots), competing in the DARPA Subterranean Challenge. Specifically, it presents the techniques utilized within the Tunnel (2019) and Urban (2020) competitions, where CoSTAR achieved 2nd and 1st place, respectively. We also discuss CoSTAR's demonstrations in Martian-analog surface and subsurface (lava tubes) exploration. The paper introduces our autonomy solution, referred to as NeBula (Networked Belief-aware Perceptual Autonomy). NeBula is an uncertainty-aware framework that aims at enabling resilient and modular autonomy solutions by performing reasoning and decision making in the belief space (space of probability distributions over the robot and world states). We discuss various components of the NeBula framework, including: (i) geometric and semantic environment mapping; (ii) a multi-modal positioning system; (iii) traversability analysis and local planning; (iv) global motion planning and exploration behavior; (i) risk-aware mission planning; (vi) networking and decentralized reasoning; and (vii) learning-enabled adaptation. We discuss the performance of NeBula on several robot types (e.g. wheeled, legged, flying), in various environments. We discuss the specific results and lessons learned from fielding this solution in the challenging courses of the DARPA Subterranean Challenge competition.

* For team website, see https://costar.jpl.nasa.gov/

Via

Access Paper or Ask Questions

Dynamic Grasping with a "Soft" Drone: From Theory to Practice

Mar 11, 2021

Joshua Fishman, Samuel Ubellacker, Nathan Hughes, Luca Carlone

Figure 1 for Dynamic Grasping with a "Soft" Drone: From Theory to Practice

Figure 2 for Dynamic Grasping with a "Soft" Drone: From Theory to Practice

Figure 3 for Dynamic Grasping with a "Soft" Drone: From Theory to Practice

Figure 4 for Dynamic Grasping with a "Soft" Drone: From Theory to Practice

Abstract:Rigid grippers used in existing aerial manipulators require precise positioning to achieve successful grasps and transmit large contact forces that may destabilize the drone. This limits the speed during grasping and prevents "dynamic grasping", where the drone attempts to grasp an object while moving. On the other hand, biological systems (e.g., birds) rely on compliant and soft parts to dampen contact forces and compensate for grasping inaccuracy, enabling impressive feats. This paper presents the first prototype of a soft drone -- a quadrotor where traditional (i.e., rigid) landing gears are replaced with a soft tendon-actuated gripper to enable aggressive grasping. We provide three key contributions. First, we describe our soft drone prototype, including electro-mechanical design, software infrastructure, and fabrication. Second, we review the set of algorithms we use for trajectory optimization and control of the drone and the soft gripper; the algorithms combine state-of-the-art techniques for quadrotor control (i.e., an adaptive geometric controller) with advanced soft robotics models (i.e., a quasi-static finite element model). Finally, we evaluate our soft drone in physics simulations (using SOFA and Unity) and in real tests in a motion-capture room. Our drone is able to dynamically grasp objects of unknown shape where baseline approaches fail. Our physical prototype ensures consistent performance, achieving 91.7% successful grasps across 23 trials. We showcase dynamic grasping results in the video attachment.

* 8 pages, 8 figures, submitted to IROS 2021. arXiv admin note: text overlap with arXiv:2004.04238

Via

Access Paper or Ask Questions

Self-supervised Geometric Perception

Mar 04, 2021

Heng Yang, Wei Dong, Luca Carlone, Vladlen Koltun

Figure 1 for Self-supervised Geometric Perception

Figure 2 for Self-supervised Geometric Perception

Figure 3 for Self-supervised Geometric Perception

Figure 4 for Self-supervised Geometric Perception

Abstract:We present self-supervised geometric perception (SGP), the first general framework to learn a feature descriptor for correspondence matching without any ground-truth geometric model labels (e.g., camera poses, rigid transformations). Our first contribution is to formulate geometric perception as an optimization problem that jointly optimizes the feature descriptor and the geometric models given a large corpus of visual measurements (e.g., images, point clouds). Under this optimization formulation, we show that two important streams of research in vision, namely robust model fitting and deep feature learning, correspond to optimizing one block of the unknown variables while fixing the other block. This analysis naturally leads to our second contribution -- the SGP algorithm that performs alternating minimization to solve the joint optimization. SGP iteratively executes two meta-algorithms: a teacher that performs robust model fitting given learned features to generate geometric pseudo-labels, and a student that performs deep feature learning under noisy supervision of the pseudo-labels. As a third contribution, we apply SGP to two perception problems on large-scale real datasets, namely relative camera pose estimation on MegaDepth and point cloud registration on 3DMatch. We demonstrate that SGP achieves state-of-the-art performance that is on-par or superior to the supervised oracles trained using ground-truth labels.

* CVPR 2021, Oral presentation. 8 pages main results, 19 pages in total, including references and supplementary

Via

Access Paper or Ask Questions

LION: Lidar-Inertial Observability-Aware Navigator for Vision-Denied Environments

Feb 05, 2021

Andrea Tagliabue, Jesus Tordesillas, Xiaoyi Cai, Angel Santamaria-Navarro, Jonathan P. How, Luca Carlone, Ali-akbar Agha-mohammadi

Figure 1 for LION: Lidar-Inertial Observability-Aware Navigator for Vision-Denied Environments

Figure 2 for LION: Lidar-Inertial Observability-Aware Navigator for Vision-Denied Environments

Figure 3 for LION: Lidar-Inertial Observability-Aware Navigator for Vision-Denied Environments

Figure 4 for LION: Lidar-Inertial Observability-Aware Navigator for Vision-Denied Environments

Abstract:State estimation for robots navigating in GPS-denied and perceptually-degraded environments, such as underground tunnels, mines and planetary subsurface voids, remains challenging in robotics. Towards this goal, we present LION (Lidar-Inertial Observability-Aware Navigator), which is part of the state estimation framework developed by the team CoSTAR for the DARPA Subterranean Challenge, where the team achieved second and first places in the Tunnel and Urban circuits in August 2019 and February 2020, respectively. LION provides high-rate odometry estimates by fusing high-frequency inertial data from an IMU and low-rate relative pose estimates from a lidar via a fixed-lag sliding window smoother. LION does not require knowledge of relative positioning between lidar and IMU, as the extrinsic calibration is estimated online. In addition, LION is able to self-assess its performance using an observability metric that evaluates whether the pose estimate is geometrically ill-constrained. Odometry and confidence estimates are used by HeRO, a supervisory algorithm that provides robust estimates by switching between different odometry sources. In this paper we benchmark the performance of LION in perceptually-degraded subterranean environments, demonstrating its high technology readiness level for deployment in the field.

* 2020 International Symposium on Experimental Robotics (ISER 2020)

Via

Access Paper or Ask Questions

Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs

Jan 24, 2021

Antoni Rosinol, Andrew Violette, Marcus Abate, Nathan Hughes, Yun Chang, Jingnan Shi, Arjun Gupta, Luca Carlone

Figure 1 for Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs

Figure 2 for Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs

Figure 3 for Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs

Figure 4 for Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs

Abstract:Humans are able to form a complex mental model of the environment they move in. This mental model captures geometric and semantic aspects of the scene, describes the environment at multiple levels of abstractions (e.g., objects, rooms, buildings), includes static and dynamic entities and their relations (e.g., a person is in a room at a given time). In contrast, current robots' internal representations still provide a partial and fragmented understanding of the environment, either in the form of a sparse or dense set of geometric primitives (e.g., points, lines, planes, voxels) or as a collection of objects. This paper attempts to reduce the gap between robot and human perception by introducing a novel representation, a 3D Dynamic Scene Graph(DSG), that seamlessly captures metric and semantic aspects of a dynamic environment. A DSG is a layered graph where nodes represent spatial concepts at different levels of abstraction, and edges represent spatio-temporal relations among nodes. Our second contribution is Kimera, the first fully automatic method to build a DSG from visual-inertial data. Kimera includes state-of-the-art techniques for visual-inertial SLAM, metric-semantic 3D reconstruction, object localization, human pose and shape estimation, and scene parsing. Our third contribution is a comprehensive evaluation of Kimera in real-life datasets and photo-realistic simulations, including a newly released dataset, uHumans2, which simulates a collection of crowded indoor and outdoor scenes. Our evaluation shows that Kimera achieves state-of-the-art performance in visual-inertial SLAM, estimates an accurate 3D metric-semantic mesh model in real-time, and builds a DSG of a complex indoor environment with tens of objects and humans in minutes. Our final contribution shows how to use a DSG for real-time hierarchical semantic path-planning. The core modules in Kimera are open-source.

* 34 pages, 25 figures, 9 tables. arXiv admin note: text overlap with arXiv:2002.06289

Via

Access Paper or Ask Questions