Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mac Schwager

FAST-Splat: Fast, Ambiguity-Free Semantics Transfer in Gaussian Splatting

Nov 20, 2024

Ola Shorinwa, Jiankai Sun, Mac Schwager

Abstract:We present FAST-Splat for fast, ambiguity-free semantic Gaussian Splatting, which seeks to address the main limitations of existing semantic Gaussian Splatting methods, namely: slow training and rendering speeds; high memory usage; and ambiguous semantic object localization. In deriving FAST-Splat , we formulate open-vocabulary semantic Gaussian Splatting as the problem of extending closed-set semantic distillation to the open-set (open-vocabulary) setting, enabling FAST-Splat to provide precise semantic object localization results, even when prompted with ambiguous user-provided natural-language queries. Further, by exploiting the explicit form of the Gaussian Splatting scene representation to the fullest extent, FAST-Splat retains the remarkable training and rendering speeds of Gaussian Splatting. Specifically, while existing semantic Gaussian Splatting methods distill semantics into a separate neural field or utilize neural models for dimensionality reduction, FAST-Splat directly augments each Gaussian with specific semantic codes, preserving the training, rendering, and memory-usage advantages of Gaussian Splatting over neural field methods. These Gaussian-specific semantic codes, together with a hash-table, enable semantic similarity to be measured with open-vocabulary user prompts and further enable FAST-Splat to respond with unambiguous semantic object labels and 3D masks, unlike prior methods. In experiments, we demonstrate that FAST-Splat is 4x to 6x faster to train with a 13x faster data pre-processing step, achieves between 18x to 75x faster rendering speeds, and requires about 3x smaller GPU memory, compared to the best-competing semantic Gaussian Splatting methods. Further, FAST-Splat achieves relatively similar or better semantic segmentation performance compared to existing methods. After the review period, we will provide links to the project website and the codebase.

Via

Access Paper or Ask Questions

Get a Grip: Multi-Finger Grasp Evaluation at Scale Enables Robust Sim-to-Real Transfer

Oct 31, 2024

Tyler Ga Wei Lum, Albert H. Li, Preston Culbertson, Krishnan Srinivasan, Aaron D. Ames, Mac Schwager, Jeannette Bohg

Abstract:This work explores conditions under which multi-finger grasping algorithms can attain robust sim-to-real transfer. While numerous large datasets facilitate learning generative models for multi-finger grasping at scale, reliable real-world dexterous grasping remains challenging, with most methods degrading when deployed on hardware. An alternate strategy is to use discriminative grasp evaluation models for grasp selection and refinement, conditioned on real-world sensor measurements. This paradigm has produced state-of-the-art results for vision-based parallel-jaw grasping, but remains unproven in the multi-finger setting. In this work, we find that existing datasets and methods have been insufficient for training discriminitive models for multi-finger grasping. To train grasp evaluators at scale, datasets must provide on the order of millions of grasps, including both positive and negative examples, with corresponding visual data resembling measurements at inference time. To that end, we release a new, open-source dataset of 3.5M grasps on 4.3K objects annotated with RGB images, point clouds, and trained NeRFs. Leveraging this dataset, we train vision-based grasp evaluators that outperform both analytic and generative modeling-based baselines on extensive simulated and real-world trials across a diverse range of objects. We show via numerous ablations that the key factor for performance is indeed the evaluator, and that its quality degrades as the dataset shrinks, demonstrating the importance of our new dataset. Project website at: https://sites.google.com/view/get-a-grip-dataset.

Via

Access Paper or Ask Questions

DisCo: Distributed Contact-Rich Trajectory Optimization for Forceful Multi-Robot Collaboration

Oct 30, 2024

Ola Shorinwa, Matthew Devlin, Elliot W. Hawkes, Mac Schwager

Figure 1 for DisCo: Distributed Contact-Rich Trajectory Optimization for Forceful Multi-Robot Collaboration

Figure 2 for DisCo: Distributed Contact-Rich Trajectory Optimization for Forceful Multi-Robot Collaboration

Figure 3 for DisCo: Distributed Contact-Rich Trajectory Optimization for Forceful Multi-Robot Collaboration

Figure 4 for DisCo: Distributed Contact-Rich Trajectory Optimization for Forceful Multi-Robot Collaboration

Abstract:We present DisCo, a distributed algorithm for contact-rich, multi-robot tasks. DisCo is a distributed contact-implicit trajectory optimization algorithm, which allows a group of robots to optimize a time sequence of forces to objects and to their environment to accomplish tasks such as collaborative manipulation, robot team sports, and modular robot locomotion. We build our algorithm on a variant of the Alternating Direction Method of Multipliers (ADMM), where each robot computes its own contact forces and contact-switching events from a smaller single-robot, contact-implicit trajectory optimization problem, while cooperating with other robots through dual variables, enforcing constraints between robots. Each robot iterates between solving its local problem, and communicating over a wireless mesh network to enforce these consistency constraints with its neighbors, ultimately converging to a coordinated plan for the group. The local problems solved by each robot are significantly less challenging than a centralized problem with all robots' contact forces and switching events, improving the computational efficiency, while also preserving the privacy of some aspects of each robot's operation. We demonstrate the effectiveness of our algorithm in simulations of collaborative manipulation, multi-robot team sports scenarios, and in modular robot locomotion, where DisCo achieves $3$x higher success rates with a 2.5x to 5x faster computation time. Further, we provide results of hardware experiments on a modular truss robot, with three collaborating truss nodes planning individually while working together to produce a punctuated rolling-gate motion of the composite structure. Videos are available on the project page: https://disco-opt.github.io.

Via

Access Paper or Ask Questions

Hierarchical Hybrid Learning for Long-Horizon Contact-Rich Robotic Assembly

Sep 24, 2024

Jiankai Sun, Aidan Curtis, Yang You, Yan Xu, Michael Koehle, Leonidas Guibas, Sachin Chitta, Mac Schwager, Hui Li

Figure 1 for Hierarchical Hybrid Learning for Long-Horizon Contact-Rich Robotic Assembly

Figure 2 for Hierarchical Hybrid Learning for Long-Horizon Contact-Rich Robotic Assembly

Figure 3 for Hierarchical Hybrid Learning for Long-Horizon Contact-Rich Robotic Assembly

Figure 4 for Hierarchical Hybrid Learning for Long-Horizon Contact-Rich Robotic Assembly

Abstract:Generalizable long-horizon robotic assembly requires reasoning at multiple levels of abstraction. End-to-end imitation learning (IL) has been proven a promising approach, but it requires a large amount of demonstration data for training and often fails to meet the high-precision requirement of assembly tasks. Reinforcement Learning (RL) approaches have succeeded in high-precision assembly tasks, but suffer from sample inefficiency and hence, are less competent at long-horizon tasks. To address these challenges, we propose a hierarchical modular approach, named ARCH (Adaptive Robotic Composition Hierarchy), which enables long-horizon high-precision assembly in contact-rich settings. ARCH employs a hierarchical planning framework, including a low-level primitive library of continuously parameterized skills and a high-level policy. The low-level primitive library includes essential skills for assembly tasks, such as grasping and inserting. These primitives consist of both RL and model-based controllers. The high-level policy, learned via imitation learning from a handful of demonstrations, selects the appropriate primitive skills and instantiates them with continuous input parameters. We extensively evaluate our approach on a real robot manipulation platform. We show that while trained on a single task, ARCH generalizes well to unseen tasks and outperforms baseline methods in terms of success rate and data efficiency. Videos can be found at https://long-horizon-assembly.github.io.

Via

Access Paper or Ask Questions

E2Map: Experience-and-Emotion Map for Self-Reflective Robot Navigation with Language Models

Sep 16, 2024

Chan Kim, Keonwoo Kim, Mintaek Oh, Hanbi Baek, Jiyang Lee, Donghwi Jung, Soojin Woo, Younkyung Woo, John Tucker, Roya Firoozi(+3 more)

Figure 1 for E2Map: Experience-and-Emotion Map for Self-Reflective Robot Navigation with Language Models

Figure 2 for E2Map: Experience-and-Emotion Map for Self-Reflective Robot Navigation with Language Models

Figure 3 for E2Map: Experience-and-Emotion Map for Self-Reflective Robot Navigation with Language Models

Figure 4 for E2Map: Experience-and-Emotion Map for Self-Reflective Robot Navigation with Language Models

Abstract:Large language models (LLMs) have shown significant potential in guiding embodied agents to execute language instructions across a range of tasks, including robotic manipulation and navigation. However, existing methods are primarily designed for static environments and do not leverage the agent's own experiences to refine its initial plans. Given that real-world environments are inherently stochastic, initial plans based solely on LLMs' general knowledge may fail to achieve their objectives, unlike in static scenarios. To address this limitation, this study introduces the Experience-and-Emotion Map (E2Map), which integrates not only LLM knowledge but also the agent's real-world experiences, drawing inspiration from human emotional responses. The proposed methodology enables one-shot behavior adjustments by updating the E2Map based on the agent's experiences. Our evaluation in stochastic navigation environments, including both simulations and real-world scenarios, demonstrates that the proposed method significantly enhances performance in stochastic environments compared to existing LLM-based approaches. Code and supplementary materials are available at https://e2map.github.io/.

* 19 pages, 28 figures. Project page: https://e2map.github.io

Via

Access Paper or Ask Questions

SAFER-Splat: A Control Barrier Function for Safe Navigation with Online Gaussian Splatting Maps

Sep 15, 2024

Timothy Chen, Aiden Swann, Javier Yu, Ola Shorinwa, Riku Murai, Monroe Kennedy III, Mac Schwager

Abstract:SAFER-Splat (Simultaneous Action Filtering and Environment Reconstruction) is a real-time, scalable, and minimally invasive action filter, based on control barrier functions, for safe robotic navigation in a detailed map constructed at runtime using Gaussian Splatting (GSplat). We propose a novel Control Barrier Function (CBF) that not only induces safety with respect to all Gaussian primitives in the scene, but when synthesized into a controller, is capable of processing hundreds of thousands of Gaussians while maintaining a minimal memory footprint and operating at 15 Hz during online Splat training. Of the total compute time, a small fraction of it consumes GPU resources, enabling uninterrupted training. The safety layer is minimally invasive, correcting robot actions only when they are unsafe. To showcase the safety filter, we also introduce SplatBridge, an open-source software package built with ROS for real-time GSplat mapping for robots. We demonstrate the safety and robustness of our pipeline first in simulation, where our method is 20-50x faster, safer, and less conservative than competing methods based on neural radiance fields. Further, we demonstrate simultaneous GSplat mapping and safety filtering on a drone hardware platform using only on-board perception. We verify that under teleoperation a human pilot cannot invoke a collision. Our videos and codebase can be found at https://chengine.github.io/safer-splat.

Via

Access Paper or Ask Questions

Gen-Swarms: Adapting Deep Generative Models to Swarms of Drones

Aug 28, 2024

Carlos Plou, Pablo Pueyo, Ruben Martinez-Cantin, Mac Schwager, Ana C. Murillo, Eduardo Montijano

Figure 1 for Gen-Swarms: Adapting Deep Generative Models to Swarms of Drones

Figure 2 for Gen-Swarms: Adapting Deep Generative Models to Swarms of Drones

Figure 3 for Gen-Swarms: Adapting Deep Generative Models to Swarms of Drones

Figure 4 for Gen-Swarms: Adapting Deep Generative Models to Swarms of Drones

Abstract:Gen-Swarms is an innovative method that leverages and combines the capabilities of deep generative models with reactive navigation algorithms to automate the creation of drone shows. Advancements in deep generative models, particularly diffusion models, have demonstrated remarkable effectiveness in generating high-quality 2D images. Building on this success, various works have extended diffusion models to 3D point cloud generation. In contrast, alternative generative models such as flow matching have been proposed, offering a simple and intuitive transition from noise to meaningful outputs. However, the application of flow matching models to 3D point cloud generation remains largely unexplored. Gen-Swarms adapts these models to automatically generate drone shows. Existing 3D point cloud generative models create point trajectories which are impractical for drone swarms. In contrast, our method not only generates accurate 3D shapes but also guides the swarm motion, producing smooth trajectories and accounting for potential collisions through a reactive navigation algorithm incorporated into the sampling process. For example, when given a text category like Airplane, Gen-Swarms can rapidly and continuously generate numerous variations of 3D airplane shapes. Our experiments demonstrate that this approach is particularly well-suited for drone shows, providing feasible trajectories, creating representative final shapes, and significantly enhancing the overall performance of drone show generation.

Via

Access Paper or Ask Questions

Out-of-Distribution Runtime Adaptation with Conformalized Neural Network Ensembles

Jun 04, 2024

Polo Contreras, Ola Shorinwa, Mac Schwager

Figure 1 for Out-of-Distribution Runtime Adaptation with Conformalized Neural Network Ensembles

Figure 2 for Out-of-Distribution Runtime Adaptation with Conformalized Neural Network Ensembles

Figure 3 for Out-of-Distribution Runtime Adaptation with Conformalized Neural Network Ensembles

Figure 4 for Out-of-Distribution Runtime Adaptation with Conformalized Neural Network Ensembles

Abstract:We present a method to integrate real-time out-of-distribution (OOD) detection for neural network trajectory predictors, and to adapt the control strategy of a robot (e.g., a self-driving car or drone) to preserve safety while operating in OOD regimes. Specifically, we use a neural network ensemble to predict the trajectory for a dynamic obstacle (such as a pedestrian), and use the maximum singular value of the empirical covariance among the ensemble as a signal for OOD detection. We calibrate this signal with a small fraction of held-out training data using the methodology of conformal prediction, to derive an OOD detector with probabilistic guarantees on the false-positive rate of the detector, given a user-specified confidence level. During in-distribution operation, we use an MPC controller to avoid collisions with the obstacle based on the trajectory predicted by the neural network ensemble. When OOD conditions are detected, we switch to a reachability-based controller to guarantee safety under the worst-case actions of the obstacle. We verify our method in extensive autonomous driving simulations in a pedestrian crossing scenario, showing that our OOD detector obtains the desired accuracy rate within a theoretically-predicted range. We also demonstrate the effectiveness of our method with real pedestrian data. We show improved safety and less conservatism in comparison with two state-of-the-art methods that also use conformal prediction, but without OOD adaptation.

Via

Access Paper or Ask Questions

Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting

May 14, 2024

Ola Shorinwa, Johnathan Tucker, Aliyah Smith, Aiden Swann, Timothy Chen, Roya Firoozi, Monroe Kennedy III, Mac Schwager

Figure 1 for Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting

Figure 2 for Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting

Figure 3 for Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting

Figure 4 for Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting

Abstract:We present Splat-MOVER, a modular robotics stack for open-vocabulary robotic manipulation, which leverages the editability of Gaussian Splatting (GSplat) scene representations to enable multi-stage manipulation tasks. Splat-MOVER consists of: (i) ASK-Splat, a GSplat representation that distills latent codes for language semantics and grasp affordance into the 3D scene. ASK-Splat enables geometric, semantic, and affordance understanding of 3D scenes, which is critical for many robotics tasks; (ii) SEE-Splat, a real-time scene-editing module using 3D semantic masking and infilling to visualize the motions of objects that result from robot interactions in the real-world. SEE-Splat creates a "digital twin" of the evolving environment throughout the manipulation task; and (iii) Grasp-Splat, a grasp generation module that uses ASK-Splat and SEE-Splat to propose candidate grasps for open-world objects. ASK-Splat is trained in real-time from RGB images in a brief scanning phase prior to operation, while SEE-Splat and Grasp-Splat run in real-time during operation. We demonstrate the superior performance of Splat-MOVER in hardware experiments on a Kinova robot compared to two recent baselines in four single-stage, open-vocabulary manipulation tasks, as well as in four multi-stage manipulation tasks using the edited scene to reflect scene changes due to prior manipulation stages, which is not possible with the existing baselines. Code for this project and a link to the project page will be made available soon.

Via

Access Paper or Ask Questions

How Generalizable Is My Behavior Cloning Policy? A Statistical Approach to Trustworthy Performance Evaluation

May 08, 2024

Joseph A. Vincent, Haruki Nishimura, Masha Itkina, Paarth Shah, Mac Schwager, Thomas Kollar

Figure 1 for How Generalizable Is My Behavior Cloning Policy? A Statistical Approach to Trustworthy Performance Evaluation

Figure 2 for How Generalizable Is My Behavior Cloning Policy? A Statistical Approach to Trustworthy Performance Evaluation

Figure 3 for How Generalizable Is My Behavior Cloning Policy? A Statistical Approach to Trustworthy Performance Evaluation

Figure 4 for How Generalizable Is My Behavior Cloning Policy? A Statistical Approach to Trustworthy Performance Evaluation

Abstract:With the rise of stochastic generative models in robot policy learning, end-to-end visuomotor policies are increasingly successful at solving complex tasks by learning from human demonstrations. Nevertheless, since real-world evaluation costs afford users only a small number of policy rollouts, it remains a challenge to accurately gauge the performance of such policies. This is exacerbated by distribution shifts causing unpredictable changes in performance during deployment. To rigorously evaluate behavior cloning policies, we present a framework that provides a tight lower-bound on robot performance in an arbitrary environment, using a minimal number of experimental policy rollouts. Notably, by applying the standard stochastic ordering to robot performance distributions, we provide a worst-case bound on the entire distribution of performance (via bounds on the cumulative distribution function) for a given task. We build upon established statistical results to ensure that the bounds hold with a user-specified confidence level and tightness, and are constructed from as few policy rollouts as possible. In experiments we evaluate policies for visuomotor manipulation in both simulation and hardware. Specifically, we (i) empirically validate the guarantees of the bounds in simulated manipulation settings, (ii) find the degree to which a learned policy deployed on hardware generalizes to new real-world environments, and (iii) rigorously compare two policies tested in out-of-distribution settings. Our experimental data, code, and implementation of confidence bounds are open-source.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions