Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shuying Deng

How to Peel with a Knife: Aligning Fine-Grained Manipulation with Human Preference

Mar 03, 2026

Toru Lin, Shuying Deng, Zhao-Heng Yin, Pieter Abbeel, Jitendra Malik

Abstract:Many essential manipulation tasks - such as food preparation, surgery, and craftsmanship - remain intractable for autonomous robots. These tasks are characterized not only by contact-rich, force-sensitive dynamics, but also by their "implicit" success criteria: unlike pick-and-place, task quality in these domains is continuous and subjective (e.g. how well a potato is peeled), making quantitative evaluation and reward engineering difficult. We present a learning framework for such tasks, using peeling with a knife as a representative example. Our approach follows a two-stage pipeline: first, we learn a robust initial policy via force-aware data collection and imitation learning, enabling generalization across object variations; second, we refine the policy through preference-based finetuning using a learned reward model that combines quantitative task metrics with qualitative human feedback, aligning policy behavior with human notions of task quality. Using only 50-200 peeling trajectories, our system achieves over 90% average success rates on challenging produce including cucumbers, apples, and potatoes, with performance improving by up to 40% through preference-based finetuning. Remarkably, policies trained on a single produce category exhibit strong zero-shot generalization to unseen in-category instances and to out-of-distribution produce from different categories while maintaining over 90% success rates.

* Project page can be found at https://toruowo.github.io/peel

Via

Access Paper or Ask Questions

DemoGen: Synthetic Demonstration Generation for Data-Efficient Visuomotor Policy Learning

Feb 24, 2025

Zhengrong Xue, Shuying Deng, Zhenyang Chen, Yixuan Wang, Zhecheng Yuan, Huazhe Xu

Figure 1 for DemoGen: Synthetic Demonstration Generation for Data-Efficient Visuomotor Policy Learning

Figure 2 for DemoGen: Synthetic Demonstration Generation for Data-Efficient Visuomotor Policy Learning

Figure 3 for DemoGen: Synthetic Demonstration Generation for Data-Efficient Visuomotor Policy Learning

Figure 4 for DemoGen: Synthetic Demonstration Generation for Data-Efficient Visuomotor Policy Learning

Abstract:Visuomotor policies have shown great promise in robotic manipulation but often require substantial amounts of human-collected data for effective performance. A key reason underlying the data demands is their limited spatial generalization capability, which necessitates extensive data collection across different object configurations. In this work, we present DemoGen, a low-cost, fully synthetic approach for automatic demonstration generation. Using only one human-collected demonstration per task, DemoGen generates spatially augmented demonstrations by adapting the demonstrated action trajectory to novel object configurations. Visual observations are synthesized by leveraging 3D point clouds as the modality and rearranging the subjects in the scene via 3D editing. Empirically, DemoGen significantly enhances policy performance across a diverse range of real-world manipulation tasks, showing its applicability even in challenging scenarios involving deformable objects, dexterous hand end-effectors, and bimanual platforms. Furthermore, DemoGen can be extended to enable additional out-of-distribution capabilities, including disturbance resistance and obstacle avoidance.

* Project website: https://demo-generation.github.io

Via

Access Paper or Ask Questions

RiEMann: Near Real-Time SE-Equivariant Robot Manipulation without Point Cloud Segmentation

Mar 28, 2024

Chongkai Gao, Zhengrong Xue, Shuying Deng, Tianhai Liang, Siqi Yang, Lin Shao, Huazhe Xu

Figure 1 for RiEMann: Near Real-Time SE-Equivariant Robot Manipulation without Point Cloud Segmentation

Figure 2 for RiEMann: Near Real-Time SE-Equivariant Robot Manipulation without Point Cloud Segmentation

Figure 3 for RiEMann: Near Real-Time SE-Equivariant Robot Manipulation without Point Cloud Segmentation

Figure 4 for RiEMann: Near Real-Time SE-Equivariant Robot Manipulation without Point Cloud Segmentation

Abstract:We present RiEMann, an end-to-end near Real-time SE(3)-Equivariant Robot Manipulation imitation learning framework from scene point cloud input. Compared to previous methods that rely on descriptor field matching, RiEMann directly predicts the target poses of objects for manipulation without any object segmentation. RiEMann learns a manipulation task from scratch with 5 to 10 demonstrations, generalizes to unseen SE(3) transformations and instances of target objects, resists visual interference of distracting objects, and follows the near real-time pose change of the target object. The scalable action space of RiEMann facilitates the addition of custom equivariant actions such as the direction of turning the faucet, which makes articulated object manipulation possible for RiEMann. In simulation and real-world 6-DOF robot manipulation experiments, we test RiEMann on 5 categories of manipulation tasks with a total of 25 variants and show that RiEMann outperforms baselines in both task success rates and SE(3) geodesic distance errors on predicted poses (reduced by 68.6%), and achieves a 5.4 frames per second (FPS) network inference speed. Code and video results are available at https://riemann-web.github.io/.

Via

Access Paper or Ask Questions