Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jan Peters

PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations

Jul 25, 2024

Cheng Qian, Julen Urain, Kevin Zakka, Jan Peters

Abstract:In this work, we introduce PianoMime, a framework for training a piano-playing agent using internet demonstrations. The internet is a promising source of large-scale demonstrations for training our robot agents. In particular, for the case of piano-playing, Youtube is full of videos of professional pianists playing a wide myriad of songs. In our work, we leverage these demonstrations to learn a generalist piano-playing agent capable of playing any arbitrary song. Our framework is divided into three parts: a data preparation phase to extract the informative features from the Youtube videos, a policy learning phase to train song-specific expert policies from the demonstrations and a policy distillation phase to distil the policies into a single generalist agent. We explore different policy designs to represent the agent and evaluate the influence of the amount of training data on the generalization capability of the agent to novel songs not available in the dataset. We show that we are able to learn a policy with up to 56\% F1 score on unseen songs.

Via

Access Paper or Ask Questions

Exciting Action: Investigating Efficient Exploration for Learning Musculoskeletal Humanoid Locomotion

Jul 16, 2024

Henri-Jacques Geiß, Firas Al-Hafez, Andre Seyfarth, Jan Peters, Davide Tateo

Abstract:Learning a locomotion controller for a musculoskeletal system is challenging due to over-actuation and high-dimensional action space. While many reinforcement learning methods attempt to address this issue, they often struggle to learn human-like gaits because of the complexity involved in engineering an effective reward function. In this paper, we demonstrate that adversarial imitation learning can address this issue by analyzing key problems and providing solutions using both current literature and novel techniques. We validate our methodology by learning walking and running gaits on a simulated humanoid model with 16 degrees of freedom and 92 Muscle-Tendon Units, achieving natural-looking gaits with only a few demonstrations.

Via

Access Paper or Ask Questions

MoVEInt: Mixture of Variational Experts for Learning Human-Robot Interactions from Demonstrations

Jul 10, 2024

Vignesh Prasad, Alap Kshirsagar, Dorothea Koert, Ruth Stock-Homburg, Jan Peters, Georgia Chalvatzaki

Abstract:Shared dynamics models are important for capturing the complexity and variability inherent in Human-Robot Interaction (HRI). Therefore, learning such shared dynamics models can enhance coordination and adaptability to enable successful reactive interactions with a human partner. In this work, we propose a novel approach for learning a shared latent space representation for HRIs from demonstrations in a Mixture of Experts fashion for reactively generating robot actions from human observations. We train a Variational Autoencoder (VAE) to learn robot motions regularized using an informative latent space prior that captures the multimodality of the human observations via a Mixture Density Network (MDN). We show how our formulation derives from a Gaussian Mixture Regression formulation that is typically used approaches for learning HRI from demonstrations such as using an HMM/GMM for learning a joint distribution over the actions of the human and the robot. We further incorporate an additional regularization to prevent "mode collapse", a common phenomenon when using latent space mixture models with VAEs. We find that our approach of using an informative MDN prior from human observations for a VAE generates more accurate robot motions compared to previous HMM-based or recurrent approaches of learning shared latent representations, which we validate on various HRI datasets involving interactions such as handshakes, fistbumps, waving, and handovers. Further experiments in a real-world human-to-robot handover scenario show the efficacy of our approach for generating successful interactions with four different human interaction partners.

* Preprint version of paper accepted at IEEE RAL. Project URL: https://bit.ly/MoVEInt

Via

Access Paper or Ask Questions

Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model

Jul 05, 2024

Duy M. H. Nguyen, An T. Le, Trung Q. Nguyen, Nghiem T. Diep, Tai Nguyen, Duy Duong-Tran, Jan Peters, Li Shen, Mathias Niepert, Daniel Sonntag

Figure 1 for Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model

Figure 2 for Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model

Figure 3 for Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model

Figure 4 for Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model

Abstract:Prompt learning methods are gaining increasing attention due to their ability to customize large vision-language models to new domains using pre-trained contextual knowledge and minimal training data. However, existing works typically rely on optimizing unified prompt inputs, often struggling with fine-grained classification tasks due to insufficient discriminative attributes. To tackle this, we consider a new framework based on a dual context of both domain-shared and class-specific contexts, where the latter is generated by Large Language Models (LLMs) such as GPTs. Such dual prompt methods enhance the model's feature representation by joining implicit and explicit factors encoded in LLM knowledge. Moreover, we formulate the Unbalanced Optimal Transport (UOT) theory to quantify the relationships between constructed prompts and visual tokens. Through partial matching, UOT can properly align discrete sets of visual tokens and prompt embeddings under different mass distributions, which is particularly valuable for handling irrelevant or noisy elements, ensuring that the preservation of mass does not restrict transport solutions. Furthermore, UOT's characteristics integrate seamlessly with image augmentation, expanding the training sample pool while maintaining a reasonable distance between perturbed images and prompt inputs. Extensive experiments across few-shot classification and adapter settings substantiate the superiority of our model over current state-of-the-art baselines.

* Version 1

Via

Access Paper or Ask Questions

Energy-based Contact Planning under Uncertainty for Robot Air Hockey

Jul 04, 2024

Julius Jankowski, Ante Marić, Puze Liu, Davide Tateo, Jan Peters, Sylvain Calinon

Abstract:Planning robot contact often requires reasoning over a horizon to anticipate outcomes, making such planning problems computationally expensive. In this letter, we propose a learning framework for efficient contact planning in real-time subject to uncertain contact dynamics. We implement our approach for the example task of robot air hockey. Based on a learned stochastic model of puck dynamics, we formulate contact planning for shooting actions as a stochastic optimal control problem with a chance constraint on hitting the goal. To achieve online re-planning capabilities, we propose to train an energy-based model to generate optimal shooting plans in real time. The performance of the trained policy is validated %in experiments both in simulation and on a real-robot setup. Furthermore, our approach was tested in a competitive setting as part of the NeurIPS 2023 Robot Air Hockey Challenge.

* This work has been submitted to the IEEE Robotics & Automation Letters for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

Jun 28, 2024

Christopher E. Mower, Yuhui Wan, Hongzhan Yu, Antoine Grosnit, Jonas Gonzalez-Billandon, Matthieu Zimmer, Jinlong Wang, Xinyu Zhang, Yao Zhao, Anbang Zhai(+12 more)

Figure 1 for ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

Figure 2 for ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

Figure 3 for ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

Figure 4 for ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

Abstract:We present a framework for intuitive robot programming by non-experts, leveraging natural language prompts and contextual information from the Robot Operating System (ROS). Our system integrates large language models (LLMs), enabling non-experts to articulate task requirements to the system through a chat interface. Key features of the framework include: integration of ROS with an AI agent connected to a plethora of open-source and commercial LLMs, automatic extraction of a behavior from the LLM output and execution of ROS actions/services, support for three behavior modes (sequence, behavior tree, state machine), imitation learning for adding new robot actions to the library of possible actions, and LLM reflection via human and environment feedback. Extensive experiments validate the framework, showcasing robustness, scalability, and versatility in diverse scenarios, including long-horizon tasks, tabletop rearrangements, and remote supervisory control. To facilitate the adoption of our framework and support the reproduction of our results, we have made our code open-source. You can access it at: https://github.com/huawei-noah/HEBO/tree/master/ROSLLM.

* This document contains 26 pages and 13 figures

Via

Access Paper or Ask Questions

Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning

May 25, 2024

Théo Vincent, Fabian Wahren, Jan Peters, Boris Belousov, Carlo D'Eramo

Figure 1 for Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning

Figure 2 for Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning

Figure 3 for Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning

Figure 4 for Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning

Abstract:Deep Reinforcement Learning (RL) is well known for being highly sensitive to hyperparameters, requiring practitioners substantial efforts to optimize them for the problem at hand. In recent years, the field of automated Reinforcement Learning (AutoRL) has grown in popularity by trying to address this issue. However, these approaches typically hinge on additional samples to select well-performing hyperparameters, hindering sample-efficiency and practicality in RL. Furthermore, most AutoRL methods are heavily based on already existing AutoML methods, which were originally developed neglecting the additional challenges inherent to RL due to its non-stationarities. In this work, we propose a new approach for AutoRL, called Adaptive $Q$-Network (AdaQN), that is tailored to RL to take into account the non-stationarity of the optimization procedure without requiring additional samples. AdaQN learns several $Q$-functions, each one trained with different hyperparameters, which are updated online using the $Q$-function with the smallest approximation error as a shared target. Our selection scheme simultaneously handles different hyperparameters while coping with the non-stationarity induced by the RL optimization procedure and being orthogonal to any critic-based RL algorithm. We demonstrate that AdaQN is theoretically sound and empirically validate it in MuJoCo control problems, showing benefits in sample-efficiency, overall performance, training stability, and robustness to stochasticity.

* Preprint

Via

Access Paper or Ask Questions

Learning Tactile Insertion in the Real World

May 01, 2024

Daniel Palenicek, Theo Gruner, Tim Schneider, Alina Böhm, Janis Lenz, Inga Pfenning, Eric Krämer, Jan Peters

Figure 1 for Learning Tactile Insertion in the Real World

Figure 2 for Learning Tactile Insertion in the Real World

Figure 3 for Learning Tactile Insertion in the Real World

Abstract:Humans have exceptional tactile sensing capabilities, which they can leverage to solve challenging, partially observable tasks that cannot be solved from visual observation alone. Research in tactile sensing attempts to unlock this new input modality for robots. Lately, these sensors have become cheaper and, thus, widely available. At the same time, the question of how to integrate them into control loops is still an active area of research, with central challenges being partial observability and the contact-rich nature of manipulation tasks. In this study, we propose to use Reinforcement Learning to learn an end-to-end policy, mapping directly from tactile sensor readings to actions. Specifically, we use Dreamer-v3 on a challenging, partially observable robotic insertion task with a Franka Research 3, both in simulation and on a real system. For the real setup, we built a robotic platform capable of resetting itself fully autonomously, allowing for extensive training runs without human supervision. Our preliminary results indicate that Dreamer is capable of utilizing tactile inputs to solve robotic manipulation tasks in simulation and reality. Furthermore, we find that providing the robot with tactile feedback generally improves task performance, though, in our setup, we do not yet include other sensing modalities. In the future, we plan to utilize our platform to evaluate a wide range of other Reinforcement Learning algorithms on tactile tasks.

Via

Access Paper or Ask Questions

Integrating Visuo-tactile Sensing with Haptic Feedback for Teleoperated Robot Manipulation

Apr 30, 2024

Noah Becker, Erik Gattung, Kay Hansel, Tim Schneider, Yaonan Zhu, Yasuhisa Hasegawa, Jan Peters

Abstract:Telerobotics enables humans to overcome spatial constraints and allows them to physically interact with the environment in remote locations. However, the sensory feedback provided by the system to the operator is often purely visual, limiting the operator's dexterity in manipulation tasks. In this work, we address this issue by equipping the robot's end-effector with high-resolution visuotactile GelSight sensors. Using low-cost MANUS-Gloves, we provide the operator with haptic feedback about forces acting at the points of contact in the form of vibration signals. We propose two different methods for estimating these forces; one based on estimating the movement of markers on the sensor surface and one deep-learning approach. Additionally, we integrate our system into a virtual-reality teleoperation pipeline in which a human operator controls both arms of a Tiago robot while receiving visual and haptic feedback. We believe that integrating haptic feedback is a crucial step for dexterous manipulation in teleoperated robotic systems.

Via

Access Paper or Ask Questions

Clustering of Motion Trajectories by a Distance Measure Based on Semantic Features

Apr 26, 2024

Christoph Zelch, Jan Peters, Oskar von Stryk

Abstract:Clustering of motion trajectories is highly relevant for human-robot interactions as it allows the anticipation of human motions, fast reaction to those, as well as the recognition of explicit gestures. Further, it allows automated analysis of recorded motion data. Many clustering algorithms for trajectories build upon distance metrics that are based on pointwise Euclidean distances. However, our work indicates that focusing on salient characteristics is often sufficient. We present a novel distance measure for motion plans consisting of state and control trajectories that is based on a compressed representation built from their main features. This approach allows a flexible choice of feature classes relevant to the respective task. The distance measure is used in agglomerative hierarchical clustering. We compare our method with the widely used dynamic time warping algorithm on test sets of motion plans for the Furuta pendulum and the Manutec robot arm and on real-world data from a human motion dataset. The proposed method demonstrates slight advantages in clustering and strong advantages in runtime, especially for long trajectories.

* 2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids), Austin, TX, USA, 2023
* Published in: 2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids). Code available at: https://github.com/cztuda/semantic-feature-clustering

Via

Access Paper or Ask Questions