Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tarik Kelestemur

Physics-Driven Data Generation for Contact-Rich Manipulation via Trajectory Optimization

Feb 27, 2025

Lujie Yang, H. J. Terry Suh, Tong Zhao, Bernhard Paus Graesdal, Tarik Kelestemur, Jiuguang Wang, Tao Pang, Russ Tedrake

Figure 1 for Physics-Driven Data Generation for Contact-Rich Manipulation via Trajectory Optimization

Figure 2 for Physics-Driven Data Generation for Contact-Rich Manipulation via Trajectory Optimization

Figure 3 for Physics-Driven Data Generation for Contact-Rich Manipulation via Trajectory Optimization

Figure 4 for Physics-Driven Data Generation for Contact-Rich Manipulation via Trajectory Optimization

Abstract:We present a low-cost data generation pipeline that integrates physics-based simulation, human demonstrations, and model-based planning to efficiently generate large-scale, high-quality datasets for contact-rich robotic manipulation tasks. Starting with a small number of embodiment-flexible human demonstrations collected in a virtual reality simulation environment, the pipeline refines these demonstrations using optimization-based kinematic retargeting and trajectory optimization to adapt them across various robot embodiments and physical parameters. This process yields a diverse, physically consistent dataset that enables cross-embodiment data transfer, and offers the potential to reuse legacy datasets collected under different hardware configurations or physical parameters. We validate the pipeline's effectiveness by training diffusion policies from the generated datasets for challenging contact-rich manipulation tasks across multiple robot embodiments, including a floating Allegro hand and bimanual robot arms. The trained policies are deployed zero-shot on hardware for bimanual iiwa arms, achieving high success rates with minimal human input. Project website: https://lujieyang.github.io/physicsgen/.

Via

Access Paper or Ask Questions

On-Robot Reinforcement Learning with Goal-Contrastive Rewards

Oct 25, 2024

Ondrej Biza, Thomas Weng, Lingfeng Sun, Karl Schmeckpeper, Tarik Kelestemur, Yecheng Jason Ma, Robert Platt, Jan-Willem van de Meent, Lawson L. S. Wong

Abstract:Reinforcement Learning (RL) has the potential to enable robots to learn from their own actions in the real world. Unfortunately, RL can be prohibitively expensive, in terms of on-robot runtime, due to inefficient exploration when learning from a sparse reward signal. Designing dense reward functions is labour-intensive and requires domain expertise. In our work, we propose GCR (Goal-Contrastive Rewards), a dense reward function learning method that can be trained on passive video demonstrations. By using videos without actions, our method is easier to scale, as we can use arbitrary videos. GCR combines two loss functions, an implicit value loss function that models how the reward increases when traversing a successful trajectory, and a goal-contrastive loss that discriminates between successful and failed trajectories. We perform experiments in simulated manipulation environments across RoboMimic and MimicGen tasks, as well as in the real world using a Franka arm and a Spot quadruped. We find that GCR leads to a more-sample efficient RL, enabling model-free RL to solve about twice as many tasks as our baseline reward learning methods. We also demonstrate positive cross-embodiment transfer from videos of people and of other robots performing a task. Appendix: \url{https://tinyurl.com/gcr-appendix-2}.

Via

Access Paper or Ask Questions

GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy

Oct 23, 2024

Yixuan Wang, Guang Yin, Binghao Huang, Tarik Kelestemur, Jiuguang Wang, Yunzhu Li

Figure 1 for GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy

Figure 2 for GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy

Figure 3 for GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy

Figure 4 for GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy

Abstract:Diffusion-based policies have shown remarkable capability in executing complex robotic manipulation tasks but lack explicit characterization of geometry and semantics, which often limits their ability to generalize to unseen objects and layouts. To enhance the generalization capabilities of Diffusion Policy, we introduce a novel framework that incorporates explicit spatial and semantic information via 3D semantic fields. We generate 3D descriptor fields from multi-view RGBD observations with large foundational vision models, then compare these descriptor fields against reference descriptors to obtain semantic fields. The proposed method explicitly considers geometry and semantics, enabling strong generalization capabilities in tasks requiring category-level generalization, resolving geometric ambiguities, and attention to subtle geometric details. We evaluate our method across eight tasks involving articulated objects and instances with varying shapes and textures from multiple object categories. Our method demonstrates its effectiveness by increasing Diffusion Policy's average success rate on unseen instances from 20% to 93%. Additionally, we provide a detailed analysis and visualization to interpret the sources of performance gain and explain how our method can generalize to novel instances.

* Accepted to Conference on Robot Learning (CoRL 2024). Project Page: https://robopil.github.io/GenDP/

Via

Access Paper or Ask Questions

Theia: Distilling Diverse Vision Foundation Models for Robot Learning

Jul 29, 2024

Jinghuan Shang, Karl Schmeckpeper, Brandon B. May, Maria Vittoria Minniti, Tarik Kelestemur, David Watkins, Laura Herlant

Figure 1 for Theia: Distilling Diverse Vision Foundation Models for Robot Learning

Figure 2 for Theia: Distilling Diverse Vision Foundation Models for Robot Learning

Figure 3 for Theia: Distilling Diverse Vision Foundation Models for Robot Learning

Figure 4 for Theia: Distilling Diverse Vision Foundation Models for Robot Learning

Abstract:Vision-based robot policy learning, which maps visual inputs to actions, necessitates a holistic understanding of diverse visual tasks beyond single-task needs like classification or segmentation. Inspired by this, we introduce Theia, a vision foundation model for robot learning that distills multiple off-the-shelf vision foundation models trained on varied vision tasks. Theia's rich visual representations encode diverse visual knowledge, enhancing downstream robot learning. Extensive experiments demonstrate that Theia outperforms its teacher models and prior robot learning models using less training data and smaller model sizes. Additionally, we quantify the quality of pre-trained visual representations and hypothesize that higher entropy in feature norm distributions leads to improved robot learning performance. Code and models are available at https://github.com/bdaiinstitute/theia.

Via

Access Paper or Ask Questions

Equivariant Diffusion Policy

Jul 01, 2024

Dian Wang, Stephen Hart, David Surovik, Tarik Kelestemur, Haojie Huang, Haibo Zhao, Mark Yeatman, Jiuguang Wang, Robin Walters, Robert Platt

Figure 1 for Equivariant Diffusion Policy

Figure 2 for Equivariant Diffusion Policy

Figure 3 for Equivariant Diffusion Policy

Figure 4 for Equivariant Diffusion Policy

Abstract:Recent work has shown diffusion models are an effective approach to learning the multimodal distributions arising from demonstration data in behavior cloning. However, a drawback of this approach is the need to learn a denoising function, which is significantly more complex than learning an explicit policy. In this work, we propose Equivariant Diffusion Policy, a novel diffusion policy learning method that leverages domain symmetries to obtain better sample efficiency and generalization in the denoising function. We theoretically analyze the $\mathrm{SO}(2)$ symmetry of full 6-DoF control and characterize when a diffusion model is $\mathrm{SO}(2)$-equivariant. We furthermore evaluate the method empirically on a set of 12 simulation tasks in MimicGen, and show that it obtains a success rate that is, on average, 21.9% higher than the baseline Diffusion Policy. We also evaluate the method on a real-world system to show that effective policies can be learned with relatively few training samples, whereas the baseline Diffusion Policy cannot.

Via

Access Paper or Ask Questions

Pregrasp Object Material Classification by a Novel Gripper Design with Integrated Spectroscopy

Jul 03, 2022

Nathaniel Hanson, Tarik Kelestemur, Deniz Erdogmus, Taskin Padir

Figure 1 for Pregrasp Object Material Classification by a Novel Gripper Design with Integrated Spectroscopy

Figure 2 for Pregrasp Object Material Classification by a Novel Gripper Design with Integrated Spectroscopy

Figure 3 for Pregrasp Object Material Classification by a Novel Gripper Design with Integrated Spectroscopy

Figure 4 for Pregrasp Object Material Classification by a Novel Gripper Design with Integrated Spectroscopy

Abstract:Robots benefit from being able to classify objects they interact with or manipulate based on their material properties. This capability ensures fine manipulation of complex objects through proper grasp pose and force selection. Prior work has focused on haptic or visual processing to determine material type at grasp time. In this work, we introduce a novel parallel robot gripper design and a method for collecting spectral readings and visual images from within the gripper finger. We train a nonlinear Support Vector Machine (SVM) that can classify the material of the object about to be grasped through recursive estimation, with increasing confidence as the distance from the finger tips to the object decreases. In order to validate the hardware design and classification method, we collect samples from 16 real and fake fruit varieties (composed of polystyrene/plastic) resulting in a dataset containing spectral curves, scene images, and high-resolution texture images as the objects are grasped, lifted, and released. Our modeling method demonstrates an accuracy of 96.4% in classifying objects in a 32 class decision problem. This represents a performance improvement by 29.4% over the state of the art computer vision algorithms at distinguishing between visually similar materials. In contrast to prior work, our recursive estimation model accounts for increasing spectral signal strength and allows for decisions to be made as the gripper approaches an object. We conclude that spectroscopy is a promising sensing modality for enabling robots to not only classify grasped objects but also understand their underlying material composition.

Via

Access Paper or Ask Questions

Tactile Pose Estimation and Policy Learning for Unknown Object Manipulation

Mar 21, 2022

Tarik Kelestemur, Robert Platt, Taskin Padir

Figure 1 for Tactile Pose Estimation and Policy Learning for Unknown Object Manipulation

Figure 2 for Tactile Pose Estimation and Policy Learning for Unknown Object Manipulation

Figure 3 for Tactile Pose Estimation and Policy Learning for Unknown Object Manipulation

Figure 4 for Tactile Pose Estimation and Policy Learning for Unknown Object Manipulation

Abstract:Object pose estimation methods allow finding locations of objects in unstructured environments. This is a highly desired skill for autonomous robot manipulation as robots need to estimate the precise poses of the objects in order to manipulate them. In this paper, we investigate the problems of tactile pose estimation and manipulation for category-level objects. Our proposed method uses a Bayes filter with a learned tactile observation model and a deterministic motion model. Later, we train policies using deep reinforcement learning where the agents use the belief estimation from the Bayes filter. Our models are trained in simulation and transferred to the real world. We analyze the reliability and the performance of our framework through a series of simulated and real-world experiments and compare our method to the baseline work. Our results show that the learned tactile observation model can localize the pose of novel objects at 2-mm and 1-degree resolution for position and orientation, respectively. Furthermore, we experiment on a bottle opening task where the gripper needs to reach the desired grasp state.

* Accepted atthe 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022)

Via

Access Paper or Ask Questions

Learning Bayes Filter Models for Tactile Localization

Nov 11, 2020

Tarik Kelestemur, Colin Keil, John P. Whitney, Robert Platt, Taskin Padir

Figure 1 for Learning Bayes Filter Models for Tactile Localization

Figure 2 for Learning Bayes Filter Models for Tactile Localization

Figure 3 for Learning Bayes Filter Models for Tactile Localization

Figure 4 for Learning Bayes Filter Models for Tactile Localization

Abstract:Localizing and tracking the pose of robotic grippers are necessary skills for manipulation tasks. However, the manipulators with imprecise kinematic models (e.g. low-cost arms) or manipulators with unknown world coordinates (e.g. poor camera-arm calibration) cannot locate the gripper with respect to the world. In these circumstances, we can leverage tactile feedback between the gripper and the environment. In this paper, we present learnable Bayes filter models that can localize robotic grippers using tactile feedback. We propose a novel observation model that conditions the tactile feedback on visual maps of the environment along with a motion model to recursively estimate the gripper's location. Our models are trained in simulation with self-supervision and transferred to the real world. Our method is evaluated on a tabletop localization task in which the gripper interacts with objects. We report results in simulation and on a real robot, generalizing over different sizes, shapes, and configurations of the objects.

* Accepted in IROS2020

Via

Access Paper or Ask Questions