Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gerhard Neumann

Karlsruhe Institute of Technology

Sequential Controlled Langevin Diffusions

Dec 10, 2024

Junhua Chen, Lorenz Richter, Julius Berner, Denis Blessing, Gerhard Neumann, Anima Anandkumar

Figure 1 for Sequential Controlled Langevin Diffusions

Figure 2 for Sequential Controlled Langevin Diffusions

Figure 3 for Sequential Controlled Langevin Diffusions

Figure 4 for Sequential Controlled Langevin Diffusions

Abstract:An effective approach for sampling from unnormalized densities is based on the idea of gradually transporting samples from an easy prior to the complicated target distribution. Two popular methods are (1) Sequential Monte Carlo (SMC), where the transport is performed through successive annealed densities via prescribed Markov chains and resampling steps, and (2) recently developed diffusion-based sampling methods, where a learned dynamical transport is used. Despite the common goal, both approaches have different, often complementary, advantages and drawbacks. The resampling steps in SMC allow focusing on promising regions of the space, often leading to robust performance. While the algorithm enjoys asymptotic guarantees, the lack of flexible, learnable transitions can lead to slow convergence. On the other hand, diffusion-based samplers are learned and can potentially better adapt themselves to the target at hand, yet often suffer from training instabilities. In this work, we present a principled framework for combining SMC with diffusion-based samplers by viewing both methods in continuous time and considering measures on path space. This culminates in the new Sequential Controlled Langevin Diffusion (SCLD) sampling method, which is able to utilize the benefits of both methods and reaches improved performance on multiple benchmark problems, in many cases using only 10% of the training budget of previous diffusion-based samplers.

Via

Access Paper or Ask Questions

Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation

Nov 22, 2024

Huy Le, Miroslav Gabriel, Tai Hoang, Gerhard Neumann, Ngo Anh Vien

Figure 1 for Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation

Figure 2 for Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation

Figure 3 for Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation

Figure 4 for Enhancing Exploration with Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation

Abstract:Learning diverse policies for non-prehensile manipulation is essential for improving skill transfer and generalization to out-of-distribution scenarios. In this work, we enhance exploration through a two-fold approach within a hybrid framework that tackles both discrete and continuous action spaces. First, we model the continuous motion parameter policy as a diffusion model, and second, we incorporate this into a maximum entropy reinforcement learning framework that unifies both the discrete and continuous components. The discrete action space, such as contact point selection, is optimized through Q-value function maximization, while the continuous part is guided by a diffusion-based policy. This hybrid approach leads to a principled objective, where the maximum entropy term is derived as a lower bound using structured variational inference. We propose the Hybrid Diffusion Policy algorithm (HyDo) and evaluate its performance on both simulation and zero-shot sim2real tasks. Our results show that HyDo encourages more diverse behavior policies, leading to significantly improved success rates across tasks - for example, increasing from 53% to 72% on a real-world 6D pose alignment task. Project page: https://leh2rng.github.io/hydo

* 8 pages

Via

Access Paper or Ask Questions

Extended Neural Contractive Dynamical Systems: On Multiple Tasks and Riemannian Safety Regions

Nov 20, 2024

Hadi Beik Mohammadi, Søren Hauberg, Georgios Arvanitidis, Gerhard Neumann, Leonel Rozo

Figure 1 for Extended Neural Contractive Dynamical Systems: On Multiple Tasks and Riemannian Safety Regions

Figure 2 for Extended Neural Contractive Dynamical Systems: On Multiple Tasks and Riemannian Safety Regions

Figure 3 for Extended Neural Contractive Dynamical Systems: On Multiple Tasks and Riemannian Safety Regions

Figure 4 for Extended Neural Contractive Dynamical Systems: On Multiple Tasks and Riemannian Safety Regions

Abstract:Stability guarantees are crucial when ensuring that a fully autonomous robot does not take undesirable or potentially harmful actions. We recently proposed the Neural Contractive Dynamical Systems (NCDS), which is a neural network architecture that guarantees contractive stability. With this, learning-from-demonstrations approaches can trivially provide stability guarantees. However, our early work left several unanswered questions, which we here address. Beyond providing an in-depth explanation of NCDS, this paper extends the framework with more careful regularization, a conditional variant of the framework for handling multiple tasks, and an uncertainty-driven approach to latent obstacle avoidance. Experiments verify that the developed system has the flexibility of ordinary neural networks while providing the stability guarantees needed for autonomous robotics.

* arXiv admin note: substantial text overlap with arXiv:2401.09352

Via

Access Paper or Ask Questions

BMP: Bridging the Gap between B-Spline and Movement Primitives

Nov 15, 2024

Weiran Liao, Ge Li, Hongyi Zhou, Rudolf Lioutikov, Gerhard Neumann

Figure 1 for BMP: Bridging the Gap between B-Spline and Movement Primitives

Figure 2 for BMP: Bridging the Gap between B-Spline and Movement Primitives

Figure 3 for BMP: Bridging the Gap between B-Spline and Movement Primitives

Figure 4 for BMP: Bridging the Gap between B-Spline and Movement Primitives

Abstract:This work introduces B-spline Movement Primitives (BMPs), a new Movement Primitive (MP) variant that leverages B-splines for motion representation. B-splines are a well-known concept in motion planning due to their ability to generate complex, smooth trajectories with only a few control points while satisfying boundary conditions, i.e., passing through a specified desired position with desired velocity. However, current usages of B-splines tend to ignore the higher-order statistics in trajectory distributions, which limits their usage in imitation learning (IL) and reinforcement learning (RL), where modeling trajectory distribution is essential. In contrast, MPs are commonly used in IL and RL for their capacity to capture trajectory likelihoods and correlations. However, MPs are constrained by their abilities to satisfy boundary conditions and usually need extra terms in learning objectives to satisfy velocity constraints. By reformulating B-splines as MPs, represented through basis functions and weight parameters, BMPs combine the strengths of both approaches, allowing B-splines to capture higher-order statistics while retaining their ability to satisfy boundary conditions. Empirical results in IL and RL demonstrate that BMPs broaden the applicability of B-splines in robot learning and offer greater expressiveness compared to existing MP variants.

Via

Access Paper or Ask Questions

A Retrospective on the Robot Air Hockey Challenge: Benchmarking Robust, Reliable, and Safe Learning Techniques for Real-world Robotics

Nov 08, 2024

Puze Liu, Jonas Günster, Niklas Funk, Simon Gröger, Dong Chen, Haitham Bou-Ammar, Julius Jankowski, Ante Marić, Sylvain Calinon, Andrej Orsula(+10 more)

Abstract:Machine learning methods have a groundbreaking impact in many application domains, but their application on real robotic platforms is still limited. Despite the many challenges associated with combining machine learning technology with robotics, robot learning remains one of the most promising directions for enhancing the capabilities of robots. When deploying learning-based approaches on real robots, extra effort is required to address the challenges posed by various real-world factors. To investigate the key factors influencing real-world deployment and to encourage original solutions from different researchers, we organized the Robot Air Hockey Challenge at the NeurIPS 2023 conference. We selected the air hockey task as a benchmark, encompassing low-level robotics problems and high-level tactics. Different from other machine learning-centric benchmarks, participants need to tackle practical challenges in robotics, such as the sim-to-real gap, low-level control issues, safety problems, real-time requirements, and the limited availability of real-world data. Furthermore, we focus on a dynamic environment, removing the typical assumption of quasi-static motions of other real-world benchmarks. The competition's results show that solutions combining learning-based approaches with prior knowledge outperform those relying solely on data when real-world deployment is challenging. Our ablation study reveals which real-world factors may be overlooked when building a learning-based solution. The successful real-world air hockey deployment of best-performing agents sets the foundation for future competitions and follow-up research directions.

* Accept at NeurIPS 2024 Dataset and Benchmark Track

Via

Access Paper or Ask Questions

Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity

Nov 02, 2024

Emiliyan Gospodinov, Vaisakh Shaj, Philipp Becker, Stefan Geyer, Gerhard Neumann

Figure 1 for Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity

Figure 2 for Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity

Figure 3 for Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity

Figure 4 for Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity

Abstract:Developing foundational world models is a key research direction for embodied intelligence, with the ability to adapt to non-stationary environments being a crucial criterion. In this work, we introduce a new formalism, Hidden Parameter-POMDP, designed for control with adaptive world models. We demonstrate that this approach enables learning robust behaviors across a variety of non-stationary RL benchmarks. Additionally, this formalism effectively learns task abstractions in an unsupervised manner, resulting in structured, task-aware latent spaces.

* Accepted at NeurIPS 2024 Workshop Adaptive Foundation Models

Via

Access Paper or Ask Questions

Diffusion for Multi-Embodiment Grasping

Oct 24, 2024

Roman Freiberg, Alexander Qualmann, Ngo Anh Vien, Gerhard Neumann

Abstract:Grasping is a fundamental skill in robotics with diverse applications across medical, industrial, and domestic domains. However, current approaches for predicting valid grasps are often tailored to specific grippers, limiting their applicability when gripper designs change. To address this limitation, we explore the transfer of grasping strategies between various gripper designs, enabling the use of data from diverse sources. In this work, we present an approach based on equivariant diffusion that facilitates gripper-agnostic encoding of scenes containing graspable objects and gripper-aware decoding of grasp poses by integrating gripper geometry into the model. We also develop a dataset generation framework that produces cluttered scenes with variable-sized object heaps, improving the training of grasp synthesis methods. Experimental evaluation on diverse object datasets demonstrates the generalizability of our approach across gripper architectures, ranging from simple parallel-jaw grippers to humanoid hands, outperforming both single-gripper and multi-gripper state-of-the-art methods.

* 8 pages

Via

Access Paper or Ask Questions

PointPatchRL -- Masked Reconstruction Improves Reinforcement Learning on Point Clouds

Oct 24, 2024

Balázs Gyenes, Nikolai Franke, Philipp Becker, Gerhard Neumann

Figure 1 for PointPatchRL -- Masked Reconstruction Improves Reinforcement Learning on Point Clouds

Figure 2 for PointPatchRL -- Masked Reconstruction Improves Reinforcement Learning on Point Clouds

Figure 3 for PointPatchRL -- Masked Reconstruction Improves Reinforcement Learning on Point Clouds

Figure 4 for PointPatchRL -- Masked Reconstruction Improves Reinforcement Learning on Point Clouds

Abstract:Perceiving the environment via cameras is crucial for Reinforcement Learning (RL) in robotics. While images are a convenient form of representation, they often complicate extracting important geometric details, especially with varying geometries or deformable objects. In contrast, point clouds naturally represent this geometry and easily integrate color and positional data from multiple camera views. However, while deep learning on point clouds has seen many recent successes, RL on point clouds is under-researched, with only the simplest encoder architecture considered in the literature. We introduce PointPatchRL (PPRL), a method for RL on point clouds that builds on the common paradigm of dividing point clouds into overlapping patches, tokenizing them, and processing the tokens with transformers. PPRL provides significant improvements compared with other point-cloud processing architectures previously used for RL. We then complement PPRL with masked reconstruction for representation learning and show that our method outperforms strong model-free and model-based baselines on image observations in complex manipulation tasks containing deformable objects and variations in target object geometry. Videos and code are available at https://alrhub.github.io/pprl-website

* 18 pages, 15 figures, accepted for publication at the 8th Conference on Robot Learning (CoRL 2024)

Via

Access Paper or Ask Questions

Learning Sub-Second Routing Optimization in Computer Networks requires Packet-Level Dynamics

Oct 14, 2024

Andreas Boltres, Niklas Freymuth, Patrick Jahnke, Holger Karl, Gerhard Neumann

Figure 1 for Learning Sub-Second Routing Optimization in Computer Networks requires Packet-Level Dynamics

Figure 2 for Learning Sub-Second Routing Optimization in Computer Networks requires Packet-Level Dynamics

Figure 3 for Learning Sub-Second Routing Optimization in Computer Networks requires Packet-Level Dynamics

Figure 4 for Learning Sub-Second Routing Optimization in Computer Networks requires Packet-Level Dynamics

Abstract:Finding efficient routes for data packets is an essential task in computer networking. The optimal routes depend greatly on the current network topology, state and traffic demand, and they can change within milliseconds. Reinforcement Learning can help to learn network representations that provide routing decisions for possibly novel situations. So far, this has commonly been done using fluid network models. We investigate their suitability for millisecond-scale adaptations with a range of traffic mixes and find that packet-level network models are necessary to capture true dynamics, in particular in the presence of TCP traffic. To this end, we present $\textit{PackeRL}$, the first packet-level Reinforcement Learning environment for routing in generic network topologies. Our experiments confirm that learning-based strategies that have been trained in fluid environments do not generalize well to this more realistic, but more challenging setup. Hence, we also introduce two new algorithms for learning sub-second Routing Optimization. We present $\textit{M-Slim}$, a dynamic shortest-path algorithm that excels at high traffic volumes but is computationally hard to scale to large network topologies, and $\textit{FieldLines}$, a novel next-hop policy design that re-optimizes routing for any network topology within milliseconds without requiring any re-training. Both algorithms outperform current learning-based approaches as well as commonly used static baseline protocols in scenarios with high-traffic volumes. All findings are backed by extensive experiments in realistic network conditions in our fast and versatile training and evaluation framework.

* Accepted at Transactions of Machine Learning Research (TMLR) 2024

Via

Access Paper or Ask Questions

TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning

Oct 12, 2024

Ge Li, Dong Tian, Hongyi Zhou, Xinkai Jiang, Rudolf Lioutikov, Gerhard Neumann

Figure 1 for TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning

Figure 2 for TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning

Figure 3 for TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning

Figure 4 for TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning

Abstract:This work introduces Transformer-based Off-Policy Episodic Reinforcement Learning (TOP-ERL), a novel algorithm that enables off-policy updates in the ERL framework. In ERL, policies predict entire action trajectories over multiple time steps instead of single actions at every time step. These trajectories are typically parameterized by trajectory generators such as Movement Primitives (MP), allowing for smooth and efficient exploration over long horizons while capturing high-level temporal correlations. However, ERL methods are often constrained to on-policy frameworks due to the difficulty of evaluating state-action values for entire action sequences, limiting their sample efficiency and preventing the use of more efficient off-policy architectures. TOP-ERL addresses this shortcoming by segmenting long action sequences and estimating the state-action values for each segment using a transformer-based critic architecture alongside an n-step return estimation. These contributions result in efficient and stable training that is reflected in the empirical results conducted on sophisticated robot learning environments. TOP-ERL significantly outperforms state-of-the-art RL methods. Thorough ablation studies additionally show the impact of key design choices on the model performance.

Via

Access Paper or Ask Questions