Abstract:Quadruped robots are proliferating in industrial environments where they carry sensor suites and serve as autonomous inspection platforms. Despite the advantages of legged robots over their wheeled counterparts on rough and uneven terrain, they are still yet to be able to reliably negotiate ubiquitous features of industrial infrastructure: ladders. Inability to traverse ladders prevents quadrupeds from inspecting dangerous locations, puts humans in harm's way, and reduces industrial site productivity. In this paper, we learn quadrupedal ladder climbing via a reinforcement learning-based control policy and a complementary hooked end-effector. We evaluate the robustness in simulation across different ladder inclinations, rung geometries, and inter-rung spacings. On hardware, we demonstrate zero-shot transfer with an overall 90% success rate at ladder angles ranging from 70{\deg} to 90{\deg}, consistent climbing performance during unmodeled perturbations, and climbing speeds 232x faster than the state of the art. This work expands the scope of industrial quadruped robot applications beyond inspection on nominal terrains to challenging infrastructural features in the environment, highlighting synergies between robot morphology and control policy when performing complex skills. More information can be found at the project website: https://sites.google.com/leggedrobotics.com/climbingladders.




Abstract:Combining manipulation with the mobility of legged robots is essential for a wide range of robotic applications. However, integrating an arm with a mobile base significantly increases the system's complexity, making precise end-effector control challenging. Existing model-based approaches are often constrained by their modeling assumptions, leading to limited robustness. Meanwhile, recent Reinforcement Learning (RL) implementations restrict the arm's workspace to be in front of the robot or track only the position to obtain decent tracking accuracy. In this work, we address these limitations by introducing a whole-body RL formulation for end-effector pose tracking in a large workspace on rough, unstructured terrains. Our proposed method involves a terrain-aware sampling strategy for the robot's initial configuration and end-effector pose commands, as well as a game-based curriculum to extend the robot's operating range. We validate our approach on the ANYmal quadrupedal robot with a six DoF robotic arm. Through our experiments, we show that the learned controller achieves precise command tracking over a large workspace and adapts across varying terrains such as stairs and slopes. On deployment, it achieves a pose-tracking error of 2.64 cm and 3.64 degrees, outperforming existing competitive baselines.
Abstract:Autonomous robot navigation in off-road environments requires a comprehensive understanding of the terrain geometry and traversability. The degraded perceptual conditions and sparse geometric information at longer ranges make the problem challenging especially when driving at high speeds. Furthermore, the sensing-to-mapping latency and the look-ahead map range can limit the maximum speed of the vehicle. Building on top of the recent work RoadRunner, in this work, we address the challenge of long-range (100 m) traversability estimation. Our RoadRunner (M&M) is an end-to-end learning-based framework that directly predicts the traversability and elevation maps at multiple ranges (50 m, 100 m) and resolutions (0.2 m, 0.8 m) taking as input multiple images and a LiDAR voxel map. Our method is trained in a self-supervised manner by leveraging the dense supervision signal generated by fusing predictions from an existing traversability estimation stack (X-Racer) in hindsight and satellite Digital Elevation Maps. RoadRunner M&M achieves a significant improvement of up to 50% for elevation mapping and 30% for traversability estimation over RoadRunner, and is able to predict in 30% more regions compared to X-Racer while achieving real-time performance. Experiments on various out-of-distribution datasets also demonstrate that our data-driven approach starts to generalize to novel unstructured environments. We integrate our proposed framework in closed-loop with the path planner to demonstrate autonomous high-speed off-road robotic navigation in challenging real-world environments. Project Page: https://leggedrobotics.github.io/roadrunner_mm/
Abstract:Pedipulation leverages the feet of legged robots for mobile manipulation, eliminating the need for dedicated robotic arms. While previous works have showcased blind and task-specific pedipulation skills, they fail to account for static and dynamic obstacles in the environment. To address this limitation, we introduce a reinforcement learning-based approach to train a whole-body obstacle-aware policy that tracks foot position commands while simultaneously avoiding obstacles. Despite training the policy in only five different static scenarios in simulation, we show that it generalizes to unknown environments with different numbers and types of obstacles. We analyze the performance of our method through a set of simulation experiments and successfully deploy the learned policy on the ANYmal quadruped, demonstrating its capability to follow foot commands while navigating around static and dynamic obstacles.




Abstract:Using doors is a longstanding challenge in robotics and is of significant practical interest in giving robots greater access to human-centric spaces. The task is challenging due to the need for online adaptation to varying door properties and precise control in manipulating the door panel and navigating through the confined doorway. To address this, we propose a learning-based controller for a legged manipulator to open and traverse through doors. The controller is trained using a teacher-student approach in simulation to learn robust task behaviors as well as estimate crucial door properties during the interaction. Unlike previous works, our approach is a single control policy that can handle both push and pull doors through learned behaviour which infers the opening direction during deployment without prior knowledge. The policy was deployed on the ANYmal legged robot with an arm and achieved a success rate of 95.0% in repeated trials conducted in an experimental setting. Additional experiments validate the policy's effectiveness and robustness to various doors and disturbances. A video overview of the method and experiments can be found at youtu.be/tQDZXN_k5NU.




Abstract:Identifying the physical properties of the surrounding environment is essential for robotic locomotion and navigation to deal with non-geometric hazards, such as slippery and deformable terrains. It would be of great benefit for robots to anticipate these extreme physical properties before contact; however, estimating environmental physical parameters from vision is still an open challenge. Animals can achieve this by using their prior experience and knowledge of what they have seen and how it felt. In this work, we propose a cross-modal self-supervised learning framework for vision-based environmental physical parameter estimation, which paves the way for future physical-property-aware locomotion and navigation. We bridge the gap between existing policies trained in simulation and identification of physical terrain parameters from vision. We propose to train a physical decoder in simulation to predict friction and stiffness from multi-modal input. The trained network allows the labeling of real-world images with physical parameters in a self-supervised manner to further train a visual network during deployment, which can densely predict the friction and stiffness from image data. We validate our physical decoder in simulation and the real world using a quadruped ANYmal robot, outperforming an existing baseline method. We show that our visual network can predict the physical properties in indoor and outdoor experiments while allowing fast adaptation to new environments.




Abstract:The ICP registration algorithm has been a preferred method for LiDAR-based robot localization for nearly a decade. However, even in modern SLAM solutions, ICP can degrade and become unreliable in geometrically ill-conditioned environments. Current solutions primarily focus on utilizing additional sources of information, such as external odometry, to either replace the degenerate directions of the optimization solution or add additional constraints in a sensor-fusion setup afterward. In response, this work investigates and compares new and existing degeneracy mitigation methods for robust LiDAR-based localization and analyzes the efficacy of these approaches in degenerate environments for the first time in the literature at this scale. Specifically, this work proposes and investigates i) the incorporation of different types of constraints into the ICP algorithm, ii) the effect of using active or passive degeneracy mitigation techniques, and iii) the choice of utilizing global point cloud registration methods on the ill-conditioned ICP problem in LiDAR degenerate environments. The study results are validated through multiple real-world field and simulated experiments. The analysis shows that active optimization degeneracy mitigation is necessary and advantageous in the absence of reliable external estimate assistance for LiDAR-SLAM. Furthermore, introducing degeneracy-aware hard constraints in the optimization before or during the optimization is shown to perform better in the wild than by including the constraints after. Moreover, with heuristic fine-tuned parameters, soft constraints can provide equal or better results in complex ill-conditioned scenarios. The implementations used in the analysis of this work are made publicly available to the community.
Abstract:Reconstructing the 3D shape of a deformable environment from the information captured by a moving depth camera is highly relevant to surgery. The underlying challenge is the fact that simultaneously estimating camera motion and tissue deformation in a fully deformable scene is an ill-posed problem, especially from a single arbitrarily moving viewpoint. Current solutions are often organ-specific and lack the robustness required to handle large deformations. Here we propose a multi-viewpoint global optimization framework that can flexibly integrate the output of low-level perception modules (data association, depth, and relative scene flow) with kinematic and scene-modeling priors to jointly estimate multiple camera motions and absolute scene flow. We use simulated noisy data to show three practical examples that successfully constrain the convergence to a unique solution. Overall, our method shows robustness to combined noisy input measures and can process hundreds of points in a few milliseconds. MultiViPerFrOG builds a generalized learning-free scaffolding for spatio-temporal encoding that can unlock advanced surgical scene representations and will facilitate the development of the computer-assisted-surgery technologies of the future.




Abstract:Current visual navigation systems often treat the environment as static, lacking the ability to adaptively interact with obstacles. This limitation leads to navigation failure when encountering unavoidable obstructions. In response, we introduce IN-Sight, a novel approach to self-supervised path planning, enabling more effective navigation strategies through interaction with obstacles. Utilizing RGB-D observations, IN-Sight calculates traversability scores and incorporates them into a semantic map, facilitating long-range path planning in complex, maze-like environments. To precisely navigate around obstacles, IN-Sight employs a local planner, trained imperatively on a differentiable costmap using representation learning techniques. The entire framework undergoes end-to-end training within the state-of-the-art photorealistic Intel SPEAR Simulator. We validate the effectiveness of IN-Sight through extensive benchmarking in a variety of simulated scenarios and ablation studies. Moreover, we demonstrate the system's real-world applicability with zero-shot sim-to-real transfer, deploying our planner on the legged robot platform ANYmal, showcasing its practical potential for interactive navigation in real environments.
Abstract:Radiance field methods such as Neural Radiance Fields (NeRFs) or 3D Gaussian Splatting (3DGS), have revolutionized graphics and novel view synthesis. Their ability to synthesize new viewpoints with photo-realistic quality, as well as capture complex volumetric and specular scenes, makes them an ideal visualization for robotic teleoperation setups. Direct camera teleoperation provides high-fidelity operation at the cost of maneuverability, while reconstruction-based approaches offer controllable scenes with lower fidelity. With this in mind, we propose replacing the traditional reconstruction-visualization components of the robotic teleoperation pipeline with online Radiance Fields, offering highly maneuverable scenes with photorealistic quality. As such, there are three main contributions to state of the art: (1) online training of Radiance Fields using live data from multiple cameras, (2) support for a variety of radiance methods including NeRF and 3DGS, (3) visualization suite for these methods including a virtual reality scene. To enable seamless integration with existing setups, these components were tested with multiple robots in multiple configurations and were displayed using traditional tools as well as the VR headset. The results across methods and robots were compared quantitatively to a baseline of mesh reconstruction, and a user study was conducted to compare the different visualization methods. For videos and code, check out https://leggedrobotics.github.io/rffr.github.io/.