Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haoran Ding

Imagination at Inference: Synthesizing In-Hand Views for Robust Visuomotor Policy Inference

Sep 19, 2025

Haoran Ding, Anqing Duan, Zezhou Sun, Dezhen Song, Yoshihiko Nakamura

Abstract:Visual observations from different viewpoints can significantly influence the performance of visuomotor policies in robotic manipulation. Among these, egocentric (in-hand) views often provide crucial information for precise control. However, in some applications, equipping robots with dedicated in-hand cameras may pose challenges due to hardware constraints, system complexity, and cost. In this work, we propose to endow robots with imaginative perception - enabling them to 'imagine' in-hand observations from agent views at inference time. We achieve this via novel view synthesis (NVS), leveraging a fine-tuned diffusion model conditioned on the relative pose between the agent and in-hand views cameras. Specifically, we apply LoRA-based fine-tuning to adapt a pretrained NVS model (ZeroNVS) to the robotic manipulation domain. We evaluate our approach on both simulation benchmarks (RoboMimic and MimicGen) and real-world experiments using a Unitree Z1 robotic arm for a strawberry picking task. Results show that synthesized in-hand views significantly enhance policy inference, effectively recovering the performance drop caused by the absence of real in-hand cameras. Our method offers a scalable and hardware-light solution for deploying robust visuomotor policies, highlighting the potential of imaginative visual reasoning in embodied agents.

* Submitted to IEEE for possible publication, under review

Via

Access Paper or Ask Questions

Towards Safe Imitation Learning via Potential Field-Guided Flow Matching

Aug 12, 2025

Haoran Ding, Anqing Duan, Zezhou Sun, Leonel Rozo, Noémie Jaquier, Dezhen Song, Yoshihiko Nakamura

Figure 1 for Towards Safe Imitation Learning via Potential Field-Guided Flow Matching

Figure 2 for Towards Safe Imitation Learning via Potential Field-Guided Flow Matching

Figure 3 for Towards Safe Imitation Learning via Potential Field-Guided Flow Matching

Figure 4 for Towards Safe Imitation Learning via Potential Field-Guided Flow Matching

Abstract:Deep generative models, particularly diffusion and flow matching models, have recently shown remarkable potential in learning complex policies through imitation learning. However, the safety of generated motions remains overlooked, particularly in complex environments with inherent obstacles. In this work, we address this critical gap by proposing Potential Field-Guided Flow Matching Policy (PF2MP), a novel approach that simultaneously learns task policies and extracts obstacle-related information, represented as a potential field, from the same set of successful demonstrations. During inference, PF2MP modulates the flow matching vector field via the learned potential field, enabling safe motion generation. By leveraging these complementary fields, our approach achieves improved safety without compromising task success across diverse environments, such as navigation tasks and robotic manipulation scenarios. We evaluate PF2MP in both simulation and real-world settings, demonstrating its effectiveness in task space and joint space control. Experimental results demonstrate that PF2MP enhances safety, achieving a significant reduction of collisions compared to baseline policies. This work paves the way for safer motion generation in unstructured and obstaclerich environments.

* 8 pages, 6 figures, Accepted to IROS 2025

Via

Access Paper or Ask Questions

Fast and Robust Visuomotor Riemannian Flow Matching Policy

Dec 14, 2024

Haoran Ding, Noémie Jaquier, Jan Peters, Leonel Rozo

Figure 1 for Fast and Robust Visuomotor Riemannian Flow Matching Policy

Figure 2 for Fast and Robust Visuomotor Riemannian Flow Matching Policy

Figure 3 for Fast and Robust Visuomotor Riemannian Flow Matching Policy

Figure 4 for Fast and Robust Visuomotor Riemannian Flow Matching Policy

Abstract:Diffusion-based visuomotor policies excel at learning complex robotic tasks by effectively combining visual data with high-dimensional, multi-modal action distributions. However, diffusion models often suffer from slow inference due to costly denoising processes or require complex sequential training arising from recent distilling approaches. This paper introduces Riemannian Flow Matching Policy (RFMP), a model that inherits the easy training and fast inference capabilities of flow matching (FM). Moreover, RFMP inherently incorporates geometric constraints commonly found in realistic robotic applications, as the robot state resides on a Riemannian manifold. To enhance the robustness of RFMP, we propose Stable RFMP (SRFMP), which leverages LaSalle's invariance principle to equip the dynamics of FM with stability to the support of a target Riemannian distribution. Rigorous evaluation on eight simulated and real-world tasks show that RFMP successfully learns and synthesizes complex sensorimotor policies on Euclidean and Riemannian spaces with efficient training and inference phases, outperforming Diffusion Policies while remaining competitive with Consistency Policies.

* 14 pages, 10 figures, 9 tables, project website: https://sites.google.com/view/rfmp

Via

Access Paper or Ask Questions