Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sethu Vijayakumar

Fast Flow-based Visuomotor Policies via Conditional Optimal Transport Couplings

May 02, 2025

Andreas Sochopoulos, Nikolay Malkin, Nikolaos Tsagkas, João Moura, Michael Gienger, Sethu Vijayakumar

Abstract:Diffusion and flow matching policies have recently demonstrated remarkable performance in robotic applications by accurately capturing multimodal robot trajectory distributions. However, their computationally expensive inference, due to the numerical integration of an ODE or SDE, limits their applicability as real-time controllers for robots. We introduce a methodology that utilizes conditional Optimal Transport couplings between noise and samples to enforce straight solutions in the flow ODE for robot action generation tasks. We show that naively coupling noise and samples fails in conditional tasks and propose incorporating condition variables into the coupling process to improve few-step performance. The proposed few-step policy achieves a 4% higher success rate with a 10x speed-up compared to Diffusion Policy on a diverse set of simulation tasks. Moreover, it produces high-quality and diverse action trajectories within 1-2 steps on a set of real-world robot tasks. Our method also retains the same training complexity as Diffusion Policy and vanilla Flow Matching, in contrast to distillation-based approaches.

Via

Access Paper or Ask Questions

ContactFusion: Stochastic Poisson Surface Maps from Visual and Contact Sensing

Mar 20, 2025

Aditya Kamireddypalli, Joao Moura, Russell Buchanan, Sethu Vijayakumar, Subramanian Ramamoorthy

Abstract:Robust and precise robotic assembly entails insertion of constituent components. Insertion success is hindered when noise in scene understanding exceeds tolerance limits, especially when fabricated with tight tolerances. In this work, we propose ContactFusion which combines global mapping with local contact information, fusing point clouds with force sensing. Our method entails a Rejection Sampling based contact occupancy sensing procedure which estimates contact locations on the end-effector from Force/Torque sensing at the wrist. We demonstrate how to fuse contact with visual information into a Stochastic Poisson Surface Map (SPSMap) - a map representation that can be updated with the Stochastic Poisson Surface Reconstruction (SPSR) algorithm. We first validate the contact occupancy sensor in simulation and show its ability to detect the contact location on the robot from force sensing information. Then, we evaluate our method in a peg-in-hole task, demonstrating an improvement in the hole pose estimate with the fusion of the contact information with the SPSMap.

Via

Access Paper or Ask Questions

Learning Visuotactile Estimation and Control for Non-prehensile Manipulation under Occlusions

Dec 17, 2024

Juan Del Aguila Ferrandis, João Moura, Sethu Vijayakumar

Figure 1 for Learning Visuotactile Estimation and Control for Non-prehensile Manipulation under Occlusions

Figure 2 for Learning Visuotactile Estimation and Control for Non-prehensile Manipulation under Occlusions

Figure 3 for Learning Visuotactile Estimation and Control for Non-prehensile Manipulation under Occlusions

Figure 4 for Learning Visuotactile Estimation and Control for Non-prehensile Manipulation under Occlusions

Abstract:Manipulation without grasping, known as non-prehensile manipulation, is essential for dexterous robots in contact-rich environments, but presents many challenges relating with underactuation, hybrid-dynamics, and frictional uncertainty. Additionally, object occlusions in a scenario of contact uncertainty and where the motion of the object evolves independently from the robot becomes a critical problem, which previous literature fails to address. We present a method for learning visuotactile state estimators and uncertainty-aware control policies for non-prehensile manipulation under occlusions, by leveraging diverse interaction data from privileged policies trained in simulation. We formulate the estimator within a Bayesian deep learning framework, to model its uncertainty, and then train uncertainty-aware control policies by incorporating the pre-learned estimator into the reinforcement learning (RL) loop, both of which lead to significantly improved estimator and policy performance. Therefore, unlike prior non-prehensile research that relies on complex external perception set-ups, our method successfully handles occlusions after sim-to-real transfer to robotic hardware with a simple onboard camera. See our video: https://youtu.be/hW-C8i_HWgs.

* Conference on Robot Learning (CoRL 2024)

Via

Access Paper or Ask Questions

Learning Few-Shot Object Placement with Intra-Category Transfer

Nov 05, 2024

Adrian Röfer, Russell Buchanan, Max Argus, Sethu Vijayakumar, Abhinav Valada

Figure 1 for Learning Few-Shot Object Placement with Intra-Category Transfer

Figure 2 for Learning Few-Shot Object Placement with Intra-Category Transfer

Figure 3 for Learning Few-Shot Object Placement with Intra-Category Transfer

Figure 4 for Learning Few-Shot Object Placement with Intra-Category Transfer

Abstract:Efficient learning from demonstration for long-horizon tasks remains an open challenge in robotics. While significant effort has been directed toward learning trajectories, a recent resurgence of object-centric approaches has demonstrated improved sample efficiency, enabling transferable robotic skills. Such approaches model tasks as a sequence of object poses over time. In this work, we propose a scheme for transferring observed object arrangements to novel object instances by learning these arrangements on canonical class frames. We then employ this scheme to enable a simple yet effective approach for training models from as few as five demonstrations to predict arrangements of a wide range of objects including tableware, cutlery, furniture, and desk spaces. We propose a method for optimizing the learned models to enables efficient learning of tasks such as setting a table or tidying up an office with intra-category transfer, even in the presence of distractors. We present extensive experimental results in simulation and on a real robotic system for table setting which, based on human evaluations, scored 73.3% compared to a human baseline. We make the code and trained models publicly available at http://oplict.cs.uni-freiburg.de.

* 8 pages, 7 figures, 2 tables, submitted to RA-L

Via

Access Paper or Ask Questions

An Efficient Representation of Whole-body Model Predictive Control for Online Compliant Dual-arm Mobile Manipulation

Oct 30, 2024

Wenqian Du, Ran Long, João Moura, Jiayi Wang, Saeid Samadi, Sethu Vijayakumar

Abstract:Dual-arm mobile manipulators can transport and manipulate large-size objects with simple end-effectors. To interact with dynamic environments with strict safety and compliance requirements, achieving whole-body motion planning online while meeting various hard constraints for such highly redundant mobile manipulators poses a significant challenge. We tackle this challenge by presenting an efficient representation of whole-body motion trajectories within our bilevel model-based predictive control (MPC) framework. We utilize B\'ezier-curve parameterization to represent the optimized collision-free trajectories of two collaborating end-effectors in the first MPC, facilitating fast long-horizon object-oriented motion planning in SE(3) while considering approximated feasibility constraints. This approach is further applied to parameterize whole-body trajectories in the second MPC for whole-body motion generation with predictive admittance control in a relatively short horizon while satisfying whole-body hard constraints. This representation enables two MPCs with continuous properties, thereby avoiding inaccurate model-state transition and dense decision-variable settings in existing MPCs using the discretization method. It strengthens the online execution of the bilevel MPC framework in high-dimensional space and facilitates the generation of consistent commands for our hybrid position/velocity-controlled robot. The simulation comparisons and real-world experiments demonstrate the efficiency and robustness of this approach in various scenarios for static and dynamic obstacle avoidance, and compliant interaction control with the manipulated object and external disturbances.

* Under Review for IEEE Transactions on Robotics

Via

Access Paper or Ask Questions

Explicit Contact Optimization in Whole-Body Contact-Rich Manipulation

Aug 28, 2024

Victor Leve, João Moura, Namiko Saito, Steve Tonneau, Sethu Vijayakumar

Abstract:Humans can exploit contacts anywhere on their body surface to manipulate large and heavy items, objects normally out of reach or multiple objects at once. However, such manipulation through contacts using the whole surface of the body remains extremely challenging to achieve on robots. This can be labelled as Whole-Body Contact-Rich Manipulation (WBCRM) problem. In addition to the high-dimensionality of the Contact-Rich Manipulation problem due to the combinatorics of contact modes, admitting contact creation anywhere on the body surface adds complexity, which hinders planning of manipulation within a reasonable time. We address this computational problem by formulating the contact and motion planning of planar WBCRM as hierarchical continuous optimization problems. To enable this formulation, we propose a novel continuous explicit representation of the robot surface, that we believe to be foundational for future research using continuous optimization for WBCRM. Our results demonstrate a significant improvement of convergence, planning time and feasibility - with, on the average, 99% less iterations and 96% reduction in time to find a solution over considered scenarios, without recourse to prone-to-failure trajectory refinement steps.

Via

Access Paper or Ask Questions

Learning Precise Affordances from Egocentric Videos for Robotic Manipulation

Aug 19, 2024

Gen Li, Nikolaos Tsagkas, Jifei Song, Ruaridh Mon-Williams, Sethu Vijayakumar, Kun Shao, Laura Sevilla-Lara

Figure 1 for Learning Precise Affordances from Egocentric Videos for Robotic Manipulation

Figure 2 for Learning Precise Affordances from Egocentric Videos for Robotic Manipulation

Figure 3 for Learning Precise Affordances from Egocentric Videos for Robotic Manipulation

Figure 4 for Learning Precise Affordances from Egocentric Videos for Robotic Manipulation

Abstract:Affordance, defined as the potential actions that an object offers, is crucial for robotic manipulation tasks. A deep understanding of affordance can lead to more intelligent AI systems. For example, such knowledge directs an agent to grasp a knife by the handle for cutting and by the blade when passing it to someone. In this paper, we present a streamlined affordance learning system that encompasses data collection, effective model training, and robot deployment. First, we collect training data from egocentric videos in an automatic manner. Different from previous methods that focus only on the object graspable affordance and represent it as coarse heatmaps, we cover both graspable (e.g., object handles) and functional affordances (e.g., knife blades, hammer heads) and extract data with precise segmentation masks. We then propose an effective model, termed Geometry-guided Affordance Transformer (GKT), to train on the collected data. GKT integrates an innovative Depth Feature Injector (DFI) to incorporate 3D shape and geometric priors, enhancing the model's understanding of affordances. To enable affordance-oriented manipulation, we further introduce Aff-Grasp, a framework that combines GKT with a grasp generation model. For comprehensive evaluation, we create an affordance evaluation dataset with pixel-wise annotations, and design real-world tasks for robot experiments. The results show that GKT surpasses the state-of-the-art by 15.9% in mIoU, and Aff-Grasp achieves high success rates of 95.5% in affordance prediction and 77.1% in successful grasping among 179 trials, including evaluations with seen, unseen objects, and cluttered scenes.

* Project page: https://reagan1311.github.io/affgrasp

Via

Access Paper or Ask Questions

NAS: N-step computation of All Solutions to the footstep planning problem

Jul 17, 2024

Jiayi Wang, Saeid Samadi, Hefan Wang, Pierre Fernbach, Olivier Stasse, Sethu Vijayakumar, Steve Tonneau

Figure 1 for NAS: N-step computation of All Solutions to the footstep planning problem

Figure 2 for NAS: N-step computation of All Solutions to the footstep planning problem

Figure 3 for NAS: N-step computation of All Solutions to the footstep planning problem

Figure 4 for NAS: N-step computation of All Solutions to the footstep planning problem

Abstract:How many ways are there to climb a staircase in a given number of steps? Infinitely many, if we focus on the continuous aspect of the problem. A finite, possibly large number if we consider the discrete aspect, i.e. on which surface which effectors are going to step and in what order. We introduce NAS, an algorithm that considers both aspects simultaneously and computes all the possible solutions to such a contact planning problem, under standard assumptions. To our knowledge NAS is the first algorithm to produce a globally optimal policy, efficiently queried in real time for planning the next footsteps of a humanoid robot. Our empirical results (in simulation and on the Talos platform) demonstrate that, despite the theoretical exponential complexity, optimisations reduce the practical complexity of NAS to a manageable bilinear form, maintaining completeness guarantees and enabling efficient GPU parallelisation. NAS is demonstrated in a variety of scenarios for the Talos robot, both in simulation and on the hardware platform. Future work will focus on further reducing computation times and extending the algorithm's applicability beyond gaited locomotion. Our companion video is available at https://youtu.be/Shkf8PyDg4g

* Submitted to Humanoids 2024

Via

Access Paper or Ask Questions

Learning Deep Dynamical Systems using Stable Neural ODEs

Apr 16, 2024

Andreas Sochopoulos, Michael Gienger, Sethu Vijayakumar

Abstract:Learning complex trajectories from demonstrations in robotic tasks has been effectively addressed through the utilization of Dynamical Systems (DS). State-of-the-art DS learning methods ensure stability of the generated trajectories; however, they have three shortcomings: a) the DS is assumed to have a single attractor, which limits the diversity of tasks it can achieve, b) state derivative information is assumed to be available in the learning process and c) the state of the DS is assumed to be measurable at inference time. We propose a class of provably stable latent DS with possibly multiple attractors, that inherit the training methods of Neural Ordinary Differential Equations, thus, dropping the dependency on state derivative information. A diffeomorphic mapping for the output and a loss that captures time-invariant trajectory similarity are proposed. We validate the efficacy of our approach through experiments conducted on a public dataset of handwritten shapes and within a simulated object manipulation task.

* Paper currently under review

Via

Access Paper or Ask Questions

Learning Goal-Directed Object Pushing in Cluttered Scenes with Location-Based Attention

Mar 26, 2024

Nils Dengler, Juan Del Aguila Ferrandis, João Moura, Sethu Vijayakumar, Maren Bennewitz

Abstract:Non-prehensile planar pushing is a challenging task due to its underactuated nature with hybrid-dynamics, where a robot needs to reason about an object's long-term behaviour and contact-switching, while being robust to contact uncertainty. The presence of clutter in the environment further complicates this task, introducing the need to include more sophisticated spatial analysis to avoid collisions. Building upon prior work on reinforcement learning (RL) with multimodal categorical exploration for planar pushing, in this paper we incorporate location-based attention to enable robust navigation through clutter. Unlike previous RL literature addressing this obstacle avoidance pushing task, our framework requires no predefined global paths and considers the target orientation of the manipulated object. Our results demonstrate that the learned policies successfully navigate through a wide range of complex obstacle configurations, including dynamic obstacles, with smooth motions, achieving the desired target object pose. We also validate the transferability of the learned policies to robotic hardware using the KUKA iiwa robot arm.

* Submitted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Via

Access Paper or Ask Questions