Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dinesh Jayaraman

Leveraging Symmetry to Accelerate Learning of Trajectory Tracking Controllers for Free-Flying Robotic Systems

Sep 17, 2024

Jake Welde, Nishanth Rao, Pratik Kunapuli, Dinesh Jayaraman, Vijay Kumar

Figure 1 for Leveraging Symmetry to Accelerate Learning of Trajectory Tracking Controllers for Free-Flying Robotic Systems

Figure 2 for Leveraging Symmetry to Accelerate Learning of Trajectory Tracking Controllers for Free-Flying Robotic Systems

Abstract:Tracking controllers enable robotic systems to accurately follow planned reference trajectories. In particular, reinforcement learning (RL) has shown promise in the synthesis of controllers for systems with complex dynamics and modest online compute budgets. However, the poor sample efficiency of RL and the challenges of reward design make training slow and sometimes unstable, especially for high-dimensional systems. In this work, we leverage the inherent Lie group symmetries of robotic systems with a floating base to mitigate these challenges when learning tracking controllers. We model a general tracking problem as a Markov decision process (MDP) that captures the evolution of both the physical and reference states. Next, we prove that symmetry in the underlying dynamics and running costs leads to an MDP homomorphism, a mapping that allows a policy trained on a lower-dimensional "quotient" MDP to be lifted to an optimal tracking controller for the original system. We compare this symmetry-informed approach to an unstructured baseline, using Proximal Policy Optimization (PPO) to learn tracking controllers for three systems: the Particle (a forced point mass), the Astrobee (a fullyactuated space robot), and the Quadrotor (an underactuated system). Results show that a symmetry-aware approach both accelerates training and reduces tracking error after the same number of training steps.

* The first three authors contributed equally to this work

Via

Access Paper or Ask Questions

DrEureka: Language Model Guided Sim-To-Real Transfer

Jun 04, 2024

Yecheng Jason Ma, William Liang, Hung-Ju Wang, Sam Wang, Yuke Zhu, Linxi Fan, Osbert Bastani, Dinesh Jayaraman

Figure 1 for DrEureka: Language Model Guided Sim-To-Real Transfer

Figure 2 for DrEureka: Language Model Guided Sim-To-Real Transfer

Figure 3 for DrEureka: Language Model Guided Sim-To-Real Transfer

Figure 4 for DrEureka: Language Model Guided Sim-To-Real Transfer

Abstract:Transferring policies learned in simulation to the real world is a promising strategy for acquiring robot skills at scale. However, sim-to-real approaches typically rely on manual design and tuning of the task reward function as well as the simulation physics parameters, rendering the process slow and human-labor intensive. In this paper, we investigate using Large Language Models (LLMs) to automate and accelerate sim-to-real design. Our LLM-guided sim-to-real approach, DrEureka, requires only the physics simulation for the target task and automatically constructs suitable reward functions and domain randomization distributions to support real-world transfer. We first demonstrate that our approach can discover sim-to-real configurations that are competitive with existing human-designed ones on quadruped locomotion and dexterous manipulation tasks. Then, we showcase that our approach is capable of solving novel robot tasks, such as quadruped balancing and walking atop a yoga ball, without iterative manual design.

* Robotics: Science and Systems (RSS) 2024. Project website and open-source code: https://eureka-research.github.io/dr-eureka/

Via

Access Paper or Ask Questions

Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies

May 24, 2024

Jianing Qian, Anastasios Panagopoulos, Dinesh Jayaraman

Figure 1 for Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies

Figure 2 for Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies

Figure 3 for Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies

Figure 4 for Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies

Abstract:Generic re-usable pre-trained image representation encoders have become a standard component of methods for many computer vision tasks. As visual representations for robots however, their utility has been limited, leading to a recent wave of efforts to pre-train robotics-specific image encoders that are better suited to robotic tasks than their generic counterparts. We propose Scene Objects From Transformers, abbreviated as SOFT, a wrapper around pre-trained vision transformer (PVT) models that bridges this gap without any further training. Rather than construct representations out of only the final layer activations, SOFT individuates and locates object-like entities from PVT attentions, and describes them with PVT activations, producing an object-centric embedding. Across standard choices of generic pre-trained vision transformers PVT, we demonstrate in each case that policies trained on SOFT(PVT) far outstrip standard PVT representations for manipulation tasks in simulated and real settings, approaching the state-of-the-art robotics-aware representations. Code, appendix and videos: https://sites.google.com/view/robot-soft/

* Accepted to International Conference on Robotics and Automation(ICRA) 2024

Via

Access Paper or Ask Questions

Privileged Sensing Scaffolds Reinforcement Learning

May 23, 2024

Edward S. Hu, James Springer, Oleh Rybkin, Dinesh Jayaraman

Figure 1 for Privileged Sensing Scaffolds Reinforcement Learning

Figure 2 for Privileged Sensing Scaffolds Reinforcement Learning

Figure 3 for Privileged Sensing Scaffolds Reinforcement Learning

Figure 4 for Privileged Sensing Scaffolds Reinforcement Learning

Abstract:We need to look at our shoelaces as we first learn to tie them but having mastered this skill, can do it from touch alone. We call this phenomenon "sensory scaffolding": observation streams that are not needed by a master might yet aid a novice learner. We consider such sensory scaffolding setups for training artificial agents. For example, a robot arm may need to be deployed with just a low-cost, robust, general-purpose camera; yet its performance may improve by having privileged training-time-only access to informative albeit expensive and unwieldy motion capture rigs or fragile tactile sensors. For these settings, we propose "Scaffolder", a reinforcement learning approach which effectively exploits privileged sensing in critics, world models, reward estimators, and other such auxiliary components that are only used at training time, to improve the target policy. For evaluating sensory scaffolding agents, we design a new "S3" suite of ten diverse simulated robotic tasks that explore a wide range of practical sensor setups. Agents must use privileged camera sensing to train blind hurdlers, privileged active visual perception to help robot arms overcome visual occlusions, privileged touch sensors to train robot hands, and more. Scaffolder easily outperforms relevant prior baselines and frequently performs comparably even to policies that have test-time access to the privileged sensors. Website: https://penn-pal-lab.github.io/scaffolder/

* ICLR 2024 Spotlight version

Via

Access Paper or Ask Questions

Composing Pre-Trained Object-Centric Representations for Robotics From "What" and "Where" Foundation Models

Apr 20, 2024

Junyao Shi, Jianing Qian, Yecheng Jason Ma, Dinesh Jayaraman

Figure 1 for Composing Pre-Trained Object-Centric Representations for Robotics From "What" and "Where" Foundation Models

Figure 2 for Composing Pre-Trained Object-Centric Representations for Robotics From "What" and "Where" Foundation Models

Figure 3 for Composing Pre-Trained Object-Centric Representations for Robotics From "What" and "Where" Foundation Models

Figure 4 for Composing Pre-Trained Object-Centric Representations for Robotics From "What" and "Where" Foundation Models

Abstract:There have recently been large advances both in pre-training visual representations for robotic control and segmenting unknown category objects in general images. To leverage these for improved robot learning, we propose $\textbf{POCR}$, a new framework for building pre-trained object-centric representations for robotic control. Building on theories of "what-where" representations in psychology and computer vision, we use segmentations from a pre-trained model to stably locate across timesteps, various entities in the scene, capturing "where" information. To each such segmented entity, we apply other pre-trained models that build vector descriptions suitable for robotic control tasks, thus capturing "what" the entity is. Thus, our pre-trained object-centric representations for control are constructed by appropriately combining the outputs of off-the-shelf pre-trained models, with no new training. On various simulated and real robotic tasks, we show that imitation policies for robotic manipulators trained on POCR achieve better performance and systematic generalization than state of the art pre-trained representations for robotics, as well as prior object-centric representations that are typically trained from scratch.

* ICRA 2024. Project website: https://sites.google.com/view/pocr

Via

Access Paper or Ask Questions

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Mar 19, 2024

Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis(+89 more)

Figure 1 for DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Figure 2 for DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Figure 3 for DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Figure 4 for DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Abstract:The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350 hours of interaction data, collected across 564 scenes and 84 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability. We open source the full dataset, policy learning code, and a detailed guide for reproducing our robot hardware setup.

* Project website: https://droid-dataset.github.io/

Via

Access Paper or Ask Questions

Can Transformers Capture Spatial Relations between Objects?

Mar 01, 2024

Chuan Wen, Dinesh Jayaraman, Yang Gao

Abstract:Spatial relationships between objects represent key scene information for humans to understand and interact with the world. To study the capability of current computer vision systems to recognize physically grounded spatial relations, we start by proposing precise relation definitions that permit consistently annotating a benchmark dataset. Despite the apparent simplicity of this task relative to others in the recognition literature, we observe that existing approaches perform poorly on this benchmark. We propose new approaches exploiting the long-range attention capabilities of transformers for this task, and evaluating key design principles. We identify a simple "RelatiViT" architecture and demonstrate that it outperforms all current approaches. To our knowledge, this is the first method to convincingly outperform naive baselines on spatial relation prediction in in-the-wild settings. The code and datasets are available in \url{https://sites.google.com/view/spatial-relation}.

* 21 pages, 8 figures, ICLR 2024

Via

Access Paper or Ask Questions

DiffusionPhase: Motion Diffusion in Frequency Domain

Dec 07, 2023

Weilin Wan, Yiming Huang, Shutong Wu, Taku Komura, Wenping Wang, Dinesh Jayaraman, Lingjie Liu

Figure 1 for DiffusionPhase: Motion Diffusion in Frequency Domain

Figure 2 for DiffusionPhase: Motion Diffusion in Frequency Domain

Figure 3 for DiffusionPhase: Motion Diffusion in Frequency Domain

Figure 4 for DiffusionPhase: Motion Diffusion in Frequency Domain

Abstract:In this study, we introduce a learning-based method for generating high-quality human motion sequences from text descriptions (e.g., ``A person walks forward"). Existing techniques struggle with motion diversity and smooth transitions in generating arbitrary-length motion sequences, due to limited text-to-motion datasets and the pose representations used that often lack expressiveness or compactness. To address these issues, we propose the first method for text-conditioned human motion generation in the frequency domain of motions. We develop a network encoder that converts the motion space into a compact yet expressive parameterized phase space with high-frequency details encoded, capturing the local periodicity of motions in time and space with high accuracy. We also introduce a conditional diffusion model for predicting periodic motion parameters based on text descriptions and a start pose, efficiently achieving smooth transitions between motion sequences associated with different text descriptions. Experiments demonstrate that our approach outperforms current methods in generating a broader variety of high-quality motions, and synthesizing long sequences with natural transitions.

Via

Access Paper or Ask Questions

TLControl: Trajectory and Language Control for Human Motion Synthesis

Nov 30, 2023

Weilin Wan, Zhiyang Dou, Taku Komura, Wenping Wang, Dinesh Jayaraman, Lingjie Liu

Abstract:Controllable human motion synthesis is essential for applications in AR/VR, gaming, movies, and embodied AI. Existing methods often focus solely on either language or full trajectory control, lacking precision in synthesizing motions aligned with user-specified trajectories, especially for multi-joint control. To address these issues, we present TLControl, a new method for realistic human motion synthesis, incorporating both low-level trajectory and high-level language semantics controls. Specifically, we first train a VQ-VAE to learn a compact latent motion space organized by body parts. We then propose a Masked Trajectories Transformer to make coarse initial predictions of full trajectories of joints based on the learned latent motion space, with user-specified partial trajectories and text descriptions as conditioning. Finally, we introduce an efficient test-time optimization to refine these coarse predictions for accurate trajectory control. Experiments demonstrate that TLControl outperforms the state-of-the-art in trajectory accuracy and time efficiency, making it practical for interactive and high-quality animation generation.

Via

Access Paper or Ask Questions

Eureka: Human-Level Reward Design via Coding Large Language Models

Oct 19, 2023

Yecheng Jason Ma, William Liang, Guanzhi Wang, De-An Huang, Osbert Bastani, Dinesh Jayaraman, Yuke Zhu, Linxi Fan, Anima Anandkumar

Figure 1 for Eureka: Human-Level Reward Design via Coding Large Language Models

Figure 2 for Eureka: Human-Level Reward Design via Coding Large Language Models

Figure 3 for Eureka: Human-Level Reward Design via Coding Large Language Models

Figure 4 for Eureka: Human-Level Reward Design via Coding Large Language Models

Abstract:Large Language Models (LLMs) have excelled as high-level semantic planners for sequential decision-making tasks. However, harnessing them to learn complex low-level manipulation tasks, such as dexterous pen spinning, remains an open problem. We bridge this fundamental gap and present Eureka, a human-level reward design algorithm powered by LLMs. Eureka exploits the remarkable zero-shot generation, code-writing, and in-context improvement capabilities of state-of-the-art LLMs, such as GPT-4, to perform evolutionary optimization over reward code. The resulting rewards can then be used to acquire complex skills via reinforcement learning. Without any task-specific prompting or pre-defined reward templates, Eureka generates reward functions that outperform expert human-engineered rewards. In a diverse suite of 29 open-source RL environments that include 10 distinct robot morphologies, Eureka outperforms human experts on 83% of the tasks, leading to an average normalized improvement of 52%. The generality of Eureka also enables a new gradient-free in-context learning approach to reinforcement learning from human feedback (RLHF), readily incorporating human inputs to improve the quality and the safety of the generated rewards without model updating. Finally, using Eureka rewards in a curriculum learning setting, we demonstrate for the first time, a simulated Shadow Hand capable of performing pen spinning tricks, adeptly manipulating a pen in circles at rapid speed.

* Project website and open-source code: https://eureka-research.github.io/

Via

Access Paper or Ask Questions