Alert button
Picture for Animesh Garg

Animesh Garg

Alert button

Fast-Grasp'D: Dexterous Multi-finger Grasp Generation Through Differentiable Simulation

Jun 13, 2023
Dylan Turpin, Tao Zhong, Shutong Zhang, Guanglei Zhu, Jingzhou Liu, Ritvik Singh, Eric Heiden, Miles Macklin, Stavros Tsogkas, Sven Dickinson, Animesh Garg

Figure 1 for Fast-Grasp'D: Dexterous Multi-finger Grasp Generation Through Differentiable Simulation
Figure 2 for Fast-Grasp'D: Dexterous Multi-finger Grasp Generation Through Differentiable Simulation
Figure 3 for Fast-Grasp'D: Dexterous Multi-finger Grasp Generation Through Differentiable Simulation
Figure 4 for Fast-Grasp'D: Dexterous Multi-finger Grasp Generation Through Differentiable Simulation

Multi-finger grasping relies on high quality training data, which is hard to obtain: human data is hard to transfer and synthetic data relies on simplifying assumptions that reduce grasp quality. By making grasp simulation differentiable, and contact dynamics amenable to gradient-based optimization, we accelerate the search for high-quality grasps with fewer limiting assumptions. We present Grasp'D-1M: a large-scale dataset for multi-finger robotic grasping, synthesized with Fast- Grasp'D, a novel differentiable grasping simulator. Grasp'D- 1M contains one million training examples for three robotic hands (three, four and five-fingered), each with multimodal visual inputs (RGB+depth+segmentation, available in mono and stereo). Grasp synthesis with Fast-Grasp'D is 10x faster than GraspIt! and 20x faster than the prior Grasp'D differentiable simulator. Generated grasps are more stable and contact-rich than GraspIt! grasps, regardless of the distance threshold used for contact generation. We validate the usefulness of our dataset by retraining an existing vision-based grasping pipeline on Grasp'D-1M, and showing a dramatic increase in model performance, predicting grasps with 30% more contact, a 33% higher epsilon metric, and 35% lower simulated displacement. Additional details at https://dexgrasp.github.io.

Viaarxiv icon

Self-Supervised Learning of Action Affordances as Interaction Modes

May 27, 2023
Liquan Wang, Nikita Dvornik, Rafael Dubeau, Mayank Mittal, Animesh Garg

Figure 1 for Self-Supervised Learning of Action Affordances as Interaction Modes
Figure 2 for Self-Supervised Learning of Action Affordances as Interaction Modes
Figure 3 for Self-Supervised Learning of Action Affordances as Interaction Modes
Figure 4 for Self-Supervised Learning of Action Affordances as Interaction Modes

When humans perform a task with an articulated object, they interact with the object only in a handful of ways, while the space of all possible interactions is nearly endless. This is because humans have prior knowledge about what interactions are likely to be successful, i.e., to open a new door we first try the handle. While learning such priors without supervision is easy for humans, it is notoriously hard for machines. In this work, we tackle unsupervised learning of priors of useful interactions with articulated objects, which we call interaction modes. In contrast to the prior art, we use no supervision or privileged information; we only assume access to the depth sensor in the simulator to learn the interaction modes. More precisely, we define a successful interaction as the one changing the visual environment substantially and learn a generative model of such interactions, that can be conditioned on the desired goal state of the object. In our experiments, we show that our model covers most of the human interaction modes, outperforms existing state-of-the-art methods for affordance learning, and can generalize to objects never seen during training. Additionally, we show promising results in the goal-conditional setup, where our model can be quickly fine-tuned to perform a given task. We show in the experiments that such affordance learning predicts interaction which covers most modes of interaction for the querying articulated object and can be fine-tuned to a goal-conditional model. For supplementary: https://actaim.github.io.

* 2023 International Conference on Robotics and Automation  
Viaarxiv icon

SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models

May 18, 2023
Ziyi Wu, Jingyu Hu, Wuyue Lu, Igor Gilitschenski, Animesh Garg

Figure 1 for SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models
Figure 2 for SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models
Figure 3 for SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models
Figure 4 for SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models

Object-centric learning aims to represent visual data with a set of object entities (a.k.a. slots), providing structured representations that enable systematic generalization. Leveraging advanced architectures like Transformers, recent approaches have made significant progress in unsupervised object discovery. In addition, slot-based representations hold great potential for generative modeling, such as controllable image generation and object manipulation in image editing. However, current slot-based methods often produce blurry images and distorted objects, exhibiting poor generative modeling capabilities. In this paper, we focus on improving slot-to-image decoding, a crucial aspect for high-quality visual generation. We introduce SlotDiffusion -- an object-centric Latent Diffusion Model (LDM) designed for both image and video data. Thanks to the powerful modeling capacity of LDMs, SlotDiffusion surpasses previous slot models in unsupervised object segmentation and visual generation across six datasets. Furthermore, our learned object features can be utilized by existing object-centric dynamics models, improving video prediction quality and downstream temporal reasoning tasks. Finally, we demonstrate the scalability of SlotDiffusion to unconstrained real-world datasets such as PASCAL VOC and COCO, when integrated with self-supervised pre-trained image encoders.

* Project page: https://slotdiffusion.github.io/ . An earlier version of this work appeared at the ICLR 2023 Workshop on Neurosymbolic Generative Models: https://nesygems.github.io/assets/pdf/papers/SlotDiffusion.pdf 
Viaarxiv icon

Learning Achievement Structure for Structured Exploration in Domains with Sparse Reward

Apr 30, 2023
Zihan Zhou, Animesh Garg

Figure 1 for Learning Achievement Structure for Structured Exploration in Domains with Sparse Reward
Figure 2 for Learning Achievement Structure for Structured Exploration in Domains with Sparse Reward
Figure 3 for Learning Achievement Structure for Structured Exploration in Domains with Sparse Reward
Figure 4 for Learning Achievement Structure for Structured Exploration in Domains with Sparse Reward

We propose Structured Exploration with Achievements (SEA), a multi-stage reinforcement learning algorithm designed for achievement-based environments, a particular type of environment with an internal achievement set. SEA first uses offline data to learn a representation of the known achievements with a determinant loss function, then recovers the dependency graph of the learned achievements with a heuristic algorithm, and finally interacts with the environment online to learn policies that master known achievements and explore new ones with a controller built with the recovered dependency graph. We empirically demonstrate that SEA can recover the achievement structure accurately and improve exploration in hard domains such as Crafter that are procedurally generated with high-dimensional observations like images.

* published as a conference paper at ICLR 2023 
Viaarxiv icon

StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos

Apr 26, 2023
Nikita Dvornik, Isma Hadji, Ran Zhang, Konstantinos G. Derpanis, Animesh Garg, Richard P. Wildes, Allan D. Jepson

Figure 1 for StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos
Figure 2 for StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos
Figure 3 for StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos
Figure 4 for StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos

Instructional videos are an important resource to learn procedural tasks from human demonstrations. However, the instruction steps in such videos are typically short and sparse, with most of the video being irrelevant to the procedure. This motivates the need to temporally localize the instruction steps in such videos, i.e. the task called key-step localization. Traditional methods for key-step localization require video-level human annotations and thus do not scale to large datasets. In this work, we tackle the problem with no human supervision and introduce StepFormer, a self-supervised model that discovers and localizes instruction steps in a video. StepFormer is a transformer decoder that attends to the video with learnable queries, and produces a sequence of slots capturing the key-steps in the video. We train our system on a large dataset of instructional videos, using their automatically-generated subtitles as the only source of supervision. In particular, we supervise our system with a sequence of text narrations using an order-aware loss function that filters out irrelevant phrases. We show that our model outperforms all previous unsupervised and weakly-supervised approaches on step detection and localization by a large margin on three challenging benchmarks. Moreover, our model demonstrates an emergent property to solve zero-shot multi-step localization and outperforms all relevant baselines at this task.

* CVPR'23 
Viaarxiv icon

$Δ$-Networks for Efficient Model Patching

Mar 26, 2023
Chaitanya Devaguptapu, Samarth Sinha, K J Joseph, Vineeth N Balasubramanian, Animesh Garg

Figure 1 for $Δ$-Networks for Efficient Model Patching
Figure 2 for $Δ$-Networks for Efficient Model Patching
Figure 3 for $Δ$-Networks for Efficient Model Patching
Figure 4 for $Δ$-Networks for Efficient Model Patching

Models pre-trained on large-scale datasets are often finetuned to support newer tasks and datasets that arrive over time. This process necessitates storing copies of the model over time for each task that the pre-trained model is finetuned to. Building on top of recent model patching work, we propose $\Delta$-Patching for finetuning neural network models in an efficient manner, without the need to store model copies. We propose a simple and lightweight method called $\Delta$-Networks to achieve this objective. Our comprehensive experiments across setting and architecture variants show that $\Delta$-Networks outperform earlier model patching work while only requiring a fraction of parameters to be trained. We also show that this approach can be used for other problem settings such as transfer learning and zero-shot domain adaptation, as well as other tasks such as detection and segmentation.

Viaarxiv icon

Errors are Useful Prompts: Instruction Guided Task Programming with Verifier-Assisted Iterative Prompting

Mar 24, 2023
Marta Skreta, Naruki Yoshikawa, Sebastian Arellano-Rubach, Zhi Ji, Lasse Bjørn Kristensen, Kourosh Darvish, Alán Aspuru-Guzik, Florian Shkurti, Animesh Garg

Figure 1 for Errors are Useful Prompts: Instruction Guided Task Programming with Verifier-Assisted Iterative Prompting
Figure 2 for Errors are Useful Prompts: Instruction Guided Task Programming with Verifier-Assisted Iterative Prompting
Figure 3 for Errors are Useful Prompts: Instruction Guided Task Programming with Verifier-Assisted Iterative Prompting
Figure 4 for Errors are Useful Prompts: Instruction Guided Task Programming with Verifier-Assisted Iterative Prompting

Generating low-level robot task plans from high-level natural language instructions remains a challenging problem. Although large language models have shown promising results in generating plans, the accuracy of the output remains unverified. Furthermore, the lack of domain-specific language data poses a limitation on the applicability of these models. In this paper, we propose CLAIRIFY, a novel approach that combines automatic iterative prompting with program verification to ensure programs written in data-scarce domain-specific language are syntactically valid and incorporate environment constraints. Our approach provides effective guidance to the language model on generating structured-like task plans by incorporating any errors as feedback, while the verifier ensures the syntactic accuracy of the generated plans. We demonstrate the effectiveness of CLAIRIFY in planning chemistry experiments by achieving state-of-the-art results. We also show that the generated plans can be executed on a real robot by integrating them with a task and motion planner.

Viaarxiv icon

MVTrans: Multi-View Perception of Transparent Objects

Feb 22, 2023
Yi Ru Wang, Yuchi Zhao, Haoping Xu, Saggi Eppel, Alan Aspuru-Guzik, Florian Shkurti, Animesh Garg

Figure 1 for MVTrans: Multi-View Perception of Transparent Objects
Figure 2 for MVTrans: Multi-View Perception of Transparent Objects
Figure 3 for MVTrans: Multi-View Perception of Transparent Objects
Figure 4 for MVTrans: Multi-View Perception of Transparent Objects

Transparent object perception is a crucial skill for applications such as robot manipulation in household and laboratory settings. Existing methods utilize RGB-D or stereo inputs to handle a subset of perception tasks including depth and pose estimation. However, transparent object perception remains to be an open problem. In this paper, we forgo the unreliable depth map from RGB-D sensors and extend the stereo based method. Our proposed method, MVTrans, is an end-to-end multi-view architecture with multiple perception capabilities, including depth estimation, segmentation, and pose estimation. Additionally, we establish a novel procedural photo-realistic dataset generation pipeline and create a large-scale transparent object detection dataset, Syn-TODD, which is suitable for training networks with all three modalities, RGB-D, stereo and multi-view RGB. Project Site: https://ac-rad.github.io/MVTrans/

* Accepted to ICRA 2023; 6 pages, 4 figures, 4 tables 
Viaarxiv icon

ORBIT: A Unified Simulation Framework for Interactive Robot Learning Environments

Jan 10, 2023
Mayank Mittal, Calvin Yu, Qinxi Yu, Jingzhou Liu, Nikita Rudin, David Hoeller, Jia Lin Yuan, Pooria Poorsarvi Tehrani, Ritvik Singh, Yunrong Guo, Hammad Mazhar, Ajay Mandlekar, Buck Babich, Gavriel State, Marco Hutter, Animesh Garg

Figure 1 for ORBIT: A Unified Simulation Framework for Interactive Robot Learning Environments
Figure 2 for ORBIT: A Unified Simulation Framework for Interactive Robot Learning Environments
Figure 3 for ORBIT: A Unified Simulation Framework for Interactive Robot Learning Environments
Figure 4 for ORBIT: A Unified Simulation Framework for Interactive Robot Learning Environments

We present ORBIT, a unified and modular framework for robot learning powered by NVIDIA Isaac Sim. It offers a modular design to easily and efficiently create robotic environments with photo-realistic scenes and fast and accurate rigid and deformable body simulation. With ORBIT, we provide a suite of benchmark tasks of varying difficulty -- from single-stage cabinet opening and cloth folding to multi-stage tasks such as room reorganization. To support working with diverse observations and action spaces, we include fixed-arm and mobile manipulators with different physically-based sensors and motion generators. ORBIT allows training reinforcement learning policies and collecting large demonstration datasets from hand-crafted or expert solutions in a matter of minutes by leveraging GPU-based parallelization. In summary, we offer an open-sourced framework that readily comes with 16 robotic platforms, 4 sensor modalities, 10 motion generators, more than 20 benchmark tasks, and wrappers to 4 learning libraries. With this framework, we aim to support various research areas, including representation learning, reinforcement learning, imitation learning, and task and motion planning. We hope it helps establish interdisciplinary collaborations in these communities, and its modularity makes it easily extensible for more tasks and applications in the future. For videos, documentation, and code: https://isaac-orbit.github.io/.

* Project website: https://isaac-orbit.github.io/ 
Viaarxiv icon

Offline Policy Optimization in RL with Variance Regularizaton

Dec 29, 2022
Riashat Islam, Samarth Sinha, Homanga Bharadhwaj, Samin Yeasar Arnob, Zhuoran Yang, Animesh Garg, Zhaoran Wang, Lihong Li, Doina Precup

Figure 1 for Offline Policy Optimization in RL with Variance Regularizaton
Figure 2 for Offline Policy Optimization in RL with Variance Regularizaton
Figure 3 for Offline Policy Optimization in RL with Variance Regularizaton
Figure 4 for Offline Policy Optimization in RL with Variance Regularizaton

Learning policies from fixed offline datasets is a key challenge to scale up reinforcement learning (RL) algorithms towards practical applications. This is often because off-policy RL algorithms suffer from distributional shift, due to mismatch between dataset and the target policy, leading to high variance and over-estimation of value functions. In this work, we propose variance regularization for offline RL algorithms, using stationary distribution corrections. We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer. The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms. We show that the regularizer leads to a lower bound to the offline policy optimization objective, which can help avoid over-estimation errors, and explains the benefits of our approach across a range of continuous control domains when compared to existing state-of-the-art algorithms.

* Old Draft, Offline RL Workshop, NeurIPS'20; 
Viaarxiv icon