Alert button
Picture for Vikash Kumar

Vikash Kumar

Alert button

Natural and Robust Walking using Reinforcement Learning without Demonstrations in High-Dimensional Musculoskeletal Models

Sep 07, 2023
Pierre Schumacher, Thomas Geijtenbeek, Vittorio Caggiano, Vikash Kumar, Syn Schmitt, Georg Martius, Daniel F. B. Haeufle

Figure 1 for Natural and Robust Walking using Reinforcement Learning without Demonstrations in High-Dimensional Musculoskeletal Models
Figure 2 for Natural and Robust Walking using Reinforcement Learning without Demonstrations in High-Dimensional Musculoskeletal Models
Figure 3 for Natural and Robust Walking using Reinforcement Learning without Demonstrations in High-Dimensional Musculoskeletal Models
Figure 4 for Natural and Robust Walking using Reinforcement Learning without Demonstrations in High-Dimensional Musculoskeletal Models

Humans excel at robust bipedal walking in complex natural environments. In each step, they adequately tune the interaction of biomechanical muscle dynamics and neuronal signals to be robust against uncertainties in ground conditions. However, it is still not fully understood how the nervous system resolves the musculoskeletal redundancy to solve the multi-objective control problem considering stability, robustness, and energy efficiency. In computer simulations, energy minimization has been shown to be a successful optimization target, reproducing natural walking with trajectory optimization or reflex-based control methods. However, these methods focus on particular motions at a time and the resulting controllers are limited when compensating for perturbations. In robotics, reinforcement learning~(RL) methods recently achieved highly stable (and efficient) locomotion on quadruped systems, but the generation of human-like walking with bipedal biomechanical models has required extensive use of expert data sets. This strong reliance on demonstrations often results in brittle policies and limits the application to new behaviors, especially considering the potential variety of movements for high-dimensional musculoskeletal models in 3D. Achieving natural locomotion with RL without sacrificing its incredible robustness might pave the way for a novel approach to studying human walking in complex natural environments. Videos: https://sites.google.com/view/naturalwalkingrl

Viaarxiv icon

REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous Manipulation

Sep 06, 2023
Zheyuan Hu, Aaron Rovinsky, Jianlan Luo, Vikash Kumar, Abhishek Gupta, Sergey Levine

Figure 1 for REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous Manipulation
Figure 2 for REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous Manipulation
Figure 3 for REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous Manipulation
Figure 4 for REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous Manipulation

Dexterous manipulation tasks involving contact-rich interactions pose a significant challenge for both model-based control systems and imitation learning algorithms. The complexity arises from the need for multi-fingered robotic hands to dynamically establish and break contacts, balance non-prehensile forces, and control large degrees of freedom. Reinforcement learning (RL) offers a promising approach due to its general applicability and capacity to autonomously acquire optimal manipulation strategies. However, its real-world application is often hindered by the necessity to generate a large number of samples, reset the environment, and obtain reward signals. In this work, we introduce an efficient system for learning dexterous manipulation skills with RL to alleviate these challenges. The main idea of our approach is the integration of recent advances in sample-efficient RL and replay buffer bootstrapping. This combination allows us to utilize data from different tasks or objects as a starting point for training new tasks, significantly improving learning efficiency. Additionally, our system completes the real-world training cycle by incorporating learned resets via an imitation-based pickup policy as well as learned reward functions, eliminating the need for manual resets and reward engineering. We demonstrate the benefits of reusing past data as replay buffer initialization for new tasks, for instance, the fast acquisition of intricate manipulation skills in the real world on a four-fingered robotic hand. (Videos: https://sites.google.com/view/reboot-dexterous)

* Accepted at CORL 2023. The first two authors contributed equally 
Viaarxiv icon

MyoDex: A Generalizable Prior for Dexterous Manipulation

Sep 06, 2023
Vittorio Caggiano, Sudeep Dasari, Vikash Kumar

Human dexterity is a hallmark of motor control. Our hands can rapidly synthesize new behaviors despite the complexity (multi-articular and multi-joints, with 23 joints controlled by more than 40 muscles) of musculoskeletal sensory-motor circuits. In this work, we take inspiration from how human dexterity builds on a diversity of prior experiences, instead of being acquired through a single task. Motivated by this observation, we set out to develop agents that can build upon their previous experience to quickly acquire new (previously unattainable) behaviors. Specifically, our approach leverages multi-task learning to implicitly capture task-agnostic behavioral priors (MyoDex) for human-like dexterity, using a physiologically realistic human hand model - MyoHand. We demonstrate MyoDex's effectiveness in few-shot generalization as well as positive transfer to a large repertoire of unseen dexterous manipulation tasks. Agents leveraging MyoDex can solve approximately 3x more tasks, and 4x faster in comparison to a distillation baseline. While prior work has synthesized single musculoskeletal control behaviors, MyoDex is the first generalizable manipulation prior that catalyzes the learning of dexterous physiological control across a large variety of contact-rich behaviors. We also demonstrate the effectiveness of our paradigms beyond musculoskeletal control towards the acquisition of dexterity in 24 DoF Adroit Hand. Website: https://sites.google.com/view/myodex

* Accepted to the 40th International Conference on Machine Learning (2023) 
Viaarxiv icon

RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking

Sep 05, 2023
Homanga Bharadhwaj, Jay Vakil, Mohit Sharma, Abhinav Gupta, Shubham Tulsiani, Vikash Kumar

The grand aim of having a single robot that can manipulate arbitrary objects in diverse settings is at odds with the paucity of robotics datasets. Acquiring and growing such datasets is strenuous due to manual efforts, operational costs, and safety challenges. A path toward such an universal agent would require a structured framework capable of wide generalization but trained within a reasonable data budget. In this paper, we develop an efficient system (RoboAgent) for training universal agents capable of multi-task manipulation skills using (a) semantic augmentations that can rapidly multiply existing datasets and (b) action representations that can extract performant policies with small yet diverse multi-modal datasets without overfitting. In addition, reliable task conditioning and an expressive policy architecture enable our agent to exhibit a diverse repertoire of skills in novel situations specified using language commands. Using merely 7500 demonstrations, we are able to train a single agent capable of 12 unique skills, and demonstrate its generalization over 38 tasks spread across common daily activities in diverse kitchen scenes. On average, RoboAgent outperforms prior methods by over 40% in unseen situations while being more sample efficient and being amenable to capability improvements and extensions through fine-tuning. Videos at https://robopen.github.io/

Viaarxiv icon

Modified Lagrangian Formulation of Gear Tooth Crack Analysis using Combined Approach of Variable Mode Decomposition (VMD) and Time Synchronous Averaging (TSA)

Aug 29, 2023
Subrata Mukherjee, Vikash Kumar, Somnath Sarangi

This paper discusses the possible observation of an integrated gear tooth crack analysis procedure that employs the combined approach of variable mode decomposition (VMD) and time synchronous averaging (TSA) based on the coupled electromechanical gearbox (CEMG) system. This paper also incorporates the modified Lagrangian formulation to model the CEMG system by considering Rayleigh's dissipative potential. An analytical improved time-varying mesh stiffness (IAM-TVMS) with different levels of gear tooth crack depts is also incorporated into the CEMG system to inspect the influence of cracks on the system's dynamic behavior. Dynamic responses of the CEMG system with different tooth crack levels have been used for further investigations. For the first time, the integrated approach of variable mode decomposition (VMD) and time-synchronous averaging (TSA) has been presented to analyze the dynamic behaviour of CEMG systems at the different gear tooth cracks have been experienced as non-stationary and complex vibration signals with noise. Based on the integrated approach of VMD-TSA, two types of nonlinear features, i.e., Lyapunov Exponent (LE) and Correlation Dimension (CD), were calculated to predict the level of chaotic vibration and complexity of the CEMG system at the different levels of gear tooth cracks. Also, the LE and CD are used as chaotic behaviour features to predict the gear tooth crack propagation level. The results of the proposed approach show significant improvements in the gear tooth crack analysis based on the chaotic features. Also, this is one of the first attempts to study the CEMG system using chaotic features based on the combined approach of VMD-TSA.

* 17 pages, 36 figures, 6th Joint International Conference on Multibody System Dynamics and the 10th Asian Conference on Multibody Dynamics 2022 
Viaarxiv icon

Integrated Approach of Gearbox Fault Diagnosis

Aug 27, 2023
Vikash Kumar, Subrata Mukherjee, Somnath Sarangi

Gearbox fault diagnosis is one of the most important parts in any industrial systems. Failure of components inside gearbox can lead to a catastrophic failure, uneven breakdown, and financial losses in industrial organization. In that case intelligent maintenance of the gearbox comes into context. This paper presents an integrated gearbox fault diagnosis approach which can easily deploy in online condition monitoring. This work introduces a nonparametric data preprocessing technique i.e., calculus enhanced energy operator (CEEO) to preserve the characteristics frequencies in the noisy and inferred vibrational signal. A set of time domain and spectral domain features are calculated from the raw and CEEO vibration signal and inputted to the multiclass support vector machine (MCSVM) to diagnose the faults on the system. An effective comparison between raw signal and CEEO signal are presented to show the impact of CEEO in gearbox fault diagnosis. The obtained results of this work look very promising and can be implemented in any type of industrial system due to its nonparametric nature.

Viaarxiv icon

SAR: Generalization of Physiological Agility and Dexterity via Synergistic Action Representation

Jul 14, 2023
Cameron Berg, Vittorio Caggiano, Vikash Kumar

Figure 1 for SAR: Generalization of Physiological Agility and Dexterity via Synergistic Action Representation
Figure 2 for SAR: Generalization of Physiological Agility and Dexterity via Synergistic Action Representation
Figure 3 for SAR: Generalization of Physiological Agility and Dexterity via Synergistic Action Representation
Figure 4 for SAR: Generalization of Physiological Agility and Dexterity via Synergistic Action Representation

Learning effective continuous control policies in high-dimensional systems, including musculoskeletal agents, remains a significant challenge. Over the course of biological evolution, organisms have developed robust mechanisms for overcoming this complexity to learn highly sophisticated strategies for motor control. What accounts for this robust behavioral flexibility? Modular control via muscle synergies, i.e. coordinated muscle co-contractions, is considered to be one putative mechanism that enables organisms to learn muscle control in a simplified and generalizable action space. Drawing inspiration from this evolved motor control strategy, we use physiologically accurate human hand and leg models as a testbed for determining the extent to which a Synergistic Action Representation (SAR) acquired from simpler tasks facilitates learning more complex tasks. We find in both cases that SAR-exploiting policies significantly outperform end-to-end reinforcement learning. Policies trained with SAR were able to achieve robust locomotion on a wide set of terrains with high sample efficiency, while baseline approaches failed to learn meaningful behaviors. Additionally, policies trained with SAR on a multiobject manipulation task significantly outperformed (>70% success) baseline approaches (<20% success). Both of these SAR-exploiting policies were also found to generalize zero-shot to out-of-domain environmental conditions, while policies that did not adopt SAR failed to generalize. Finally, we establish the generality of SAR on broader high-dimensional control problems using a robotic manipulation task set and a full-body humanoid locomotion task. To the best of our knowledge, this investigation is the first of its kind to present an end-to-end pipeline for discovering synergies and using this representation to learn high-dimensional continuous control across a wide diversity of tasks.

* Presented at RSS 2023 
Viaarxiv icon

LIV: Language-Image Representations and Rewards for Robotic Control

Jun 01, 2023
Yecheng Jason Ma, William Liang, Vaidehi Som, Vikash Kumar, Amy Zhang, Osbert Bastani, Dinesh Jayaraman

Figure 1 for LIV: Language-Image Representations and Rewards for Robotic Control
Figure 2 for LIV: Language-Image Representations and Rewards for Robotic Control
Figure 3 for LIV: Language-Image Representations and Rewards for Robotic Control
Figure 4 for LIV: Language-Image Representations and Rewards for Robotic Control

We present Language-Image Value learning (LIV), a unified objective for vision-language representation and reward learning from action-free videos with text annotations. Exploiting a novel connection between dual reinforcement learning and mutual information contrastive learning, the LIV objective trains a multi-modal representation that implicitly encodes a universal value function for tasks specified as language or image goals. We use LIV to pre-train the first control-centric vision-language representation from large human video datasets such as EpicKitchen. Given only a language or image goal, the pre-trained LIV model can assign dense rewards to each frame in videos of unseen robots or humans attempting that task in unseen environments. Further, when some target domain-specific data is available, the same objective can be used to fine-tune and improve LIV and even other pre-trained representations for robotic control and reward specification in that domain. In our experiments on several simulated and real-world robot environments, LIV models consistently outperform the best prior input state representations for imitation learning, as well as reward specification methods for policy synthesis. Our results validate the advantages of joint vision-language representation and reward learning within the unified, compact LIV framework.

* Extended version of ICML 2023 camera-ready; Project website: https://penn-pal-lab.github.io/LIV/ 
Viaarxiv icon

TorchRL: A data-driven decision-making library for PyTorch

Jun 01, 2023
Albert Bou, Matteo Bettini, Sebastian Dittert, Vikash Kumar, Shagun Sodhani, Xiaomeng Yang, Gianni De Fabritiis, Vincent Moens

Figure 1 for TorchRL: A data-driven decision-making library for PyTorch
Figure 2 for TorchRL: A data-driven decision-making library for PyTorch
Figure 3 for TorchRL: A data-driven decision-making library for PyTorch
Figure 4 for TorchRL: A data-driven decision-making library for PyTorch

Striking a balance between integration and modularity is crucial for a machine learning library to be versatile and user-friendly, especially in handling decision and control tasks that involve large development teams and complex, real-world data, and environments. To address this issue, we propose TorchRL, a generalistic control library for PyTorch that provides well-integrated, yet standalone components. With a versatile and robust primitive design, TorchRL facilitates streamlined algorithm development across the many branches of Reinforcement Learning (RL) and control. We introduce a new PyTorch primitive, TensorDict, as a flexible data carrier that empowers the integration of the library's components while preserving their modularity. Hence replay buffers, datasets, distributed data collectors, environments, transforms and objectives can be effortlessly used in isolation or combined. We provide a detailed description of the building blocks, supporting code examples and an extensive overview of the library across domains and tasks. Finally, we show comparative benchmarks to demonstrate its computational efficiency. TorchRL fosters long-term support and is publicly available on GitHub for greater reproducibility and collaboration within the research community. The code is opensourced on https://github.com/pytorch/rl.

Viaarxiv icon