Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Joschka Boedecker

Imitation Learning from Nonlinear MPC via the Exact Q-Loss and its Gauss-Newton Approximation

Apr 03, 2023

Andrea Ghezzi, Jasper Hoffman, Jonathan Frey, Joschka Boedecker, Moritz Diehl

Abstract:This work presents a novel loss function for learning nonlinear Model Predictive Control policies via Imitation Learning. Standard approaches to Imitation Learning neglect information about the expert and generally adopt a loss function based on the distance between expert and learned controls. In this work, we present a loss based on the Q-function directly embedding the performance objectives and constraint satisfaction of the associated Optimal Control Problem (OCP). However, training a Neural Network with the Q-loss requires solving the associated OCP for each new sample. To alleviate the computational burden, we derive a second Q-loss based on the Gauss-Newton approximation of the OCP resulting in a faster training time. We validate our losses against Behavioral Cloning, the standard approach to Imitation Learning, on the control of a nonlinear system with constraints. The final results show that the Q-function-based losses significantly reduce the amount of constraint violations while achieving comparable or better closed-loop costs.

* Submitted to Conference on Decision and Control (CDC) 2023. The paper contains 6 pages

Via

Access Paper or Ask Questions

Robust Tumor Detection from Coarse Annotations via Multi-Magnification Ensembles

Mar 29, 2023

Mehdi Naouar, Gabriel Kalweit, Ignacio Mastroleo, Philipp Poxleitner, Marc Metzger, Joschka Boedecker, Maria Kalweit

Abstract:Cancer detection and classification from gigapixel whole slide images of stained tissue specimens has recently experienced enormous progress in computational histopathology. The limitation of available pixel-wise annotated scans shifted the focus from tumor localization to global slide-level classification on the basis of (weakly-supervised) multiple-instance learning despite the clinical importance of local cancer detection. However, the worse performance of these techniques in comparison to fully supervised methods has limited their usage until now for diagnostic interventions in domains of life-threatening diseases such as cancer. In this work, we put the focus back on tumor localization in form of a patch-level classification task and take up the setting of so-called coarse annotations, which provide greater training supervision while remaining feasible from a clinical standpoint. To this end, we present a novel ensemble method that not only significantly improves the detection accuracy of metastasis on the open CAMELYON16 data set of sentinel lymph nodes of breast cancer patients, but also considerably increases its robustness against noise while training on coarse annotations. Our experiments show that better results can be achieved with our technique making it clinically feasible to use for cancer diagnosis and opening a new avenue for translational and clinical research.

Via

Access Paper or Ask Questions

Incorporating Recurrent Reinforcement Learning into Model Predictive Control for Adaptive Control in Autonomous Driving

Jan 30, 2023

Yuan Zhang, Joschka Boedecker, Chuxuan Li, Guyue Zhou

Abstract:Model Predictive Control (MPC) is attracting tremendous attention in the autonomous driving task as a powerful control technique. The success of an MPC controller strongly depends on an accurate internal dynamics model. However, the static parameters, usually learned by system identification, often fail to adapt to both internal and external perturbations in real-world scenarios. In this paper, we firstly (1) reformulate the problem as a Partially Observed Markov Decision Process (POMDP) that absorbs the uncertainties into observations and maintains Markov property into hidden states; and (2) learn a recurrent policy continually adapting the parameters of the dynamics model via Recurrent Reinforcement Learning (RRL) for optimal and adaptive control; and (3) finally evaluate the proposed algorithm (referred as $\textit{MPC-RRL}$) in CARLA simulator and leading to robust behaviours under a wide range of perturbations.

Via

Access Paper or Ask Questions

A Hierarchical Approach for Strategic Motion Planning in Autonomous Racing

Dec 03, 2022

Rudolf Reiter, Jasper Hoffmann, Joschka Boedecker, Moritz Diehl

Abstract:We present an approach for safe trajectory planning, where a strategic task related to autonomous racing is learned sample-efficient within a simulation environment. A high-level policy, represented as a neural network, outputs a reward specification that is used within the cost function of a parametric nonlinear model predictive controller (NMPC). By including constraints and vehicle kinematics in the NLP, we are able to guarantee safe and feasible trajectories related to the used model. Compared to classical reinforcement learning (RL), our approach restricts the exploration to safe trajectories, starts with a good prior performance and yields full trajectories that can be passed to a tracking lowest-level controller. We do not address the lowest-level controller in this work and assume perfect tracking of feasible trajectories. We show the superior performance of our algorithm on simulated racing tasks that include high-level decision making. The vehicle learns to efficiently overtake slower vehicles and to avoid getting overtaken by blocking faster vehicles.

Via

Access Paper or Ask Questions

On the calibration of underrepresented classes in LiDAR-based semantic segmentation

Oct 13, 2022

Mariella Dreissig, Florian Piewak, Joschka Boedecker

Figure 1 for On the calibration of underrepresented classes in LiDAR-based semantic segmentation

Figure 2 for On the calibration of underrepresented classes in LiDAR-based semantic segmentation

Figure 3 for On the calibration of underrepresented classes in LiDAR-based semantic segmentation

Figure 4 for On the calibration of underrepresented classes in LiDAR-based semantic segmentation

Abstract:The calibration of deep learning-based perception models plays a crucial role in their reliability. Our work focuses on a class-wise evaluation of several model's confidence performance for LiDAR-based semantic segmentation with the aim of providing insights into the calibration of underrepresented classes. Those classes often include VRUs and are thus of particular interest for safety reasons. With the help of a metric based on sparsification curves we compare the calibration abilities of three semantic segmentation models with different architectural concepts, each in a in deterministic and a probabilistic version. By identifying and describing the dependency between the predictive performance of a class and the respective calibration quality we aim to facilitate the model selection and refinement for safety-critical applications.

Via

Access Paper or Ask Questions

Latent Plans for Task-Agnostic Offline Reinforcement Learning

Sep 19, 2022

Erick Rosete-Beas, Oier Mees, Gabriel Kalweit, Joschka Boedecker, Wolfram Burgard

Figure 1 for Latent Plans for Task-Agnostic Offline Reinforcement Learning

Figure 2 for Latent Plans for Task-Agnostic Offline Reinforcement Learning

Figure 3 for Latent Plans for Task-Agnostic Offline Reinforcement Learning

Figure 4 for Latent Plans for Task-Agnostic Offline Reinforcement Learning

Abstract:Everyday tasks of long-horizon and comprising a sequence of multiple implicit subtasks still impose a major challenge in offline robot control. While a number of prior methods aimed to address this setting with variants of imitation and offline reinforcement learning, the learned behavior is typically narrow and often struggles to reach configurable long-horizon goals. As both paradigms have complementary strengths and weaknesses, we propose a novel hierarchical approach that combines the strengths of both methods to learn task-agnostic long-horizon policies from high-dimensional camera observations. Concretely, we combine a low-level policy that learns latent skills via imitation learning and a high-level policy learned from offline reinforcement learning for skill-chaining the latent behavior priors. Experiments in various simulated and real robot control tasks show that our formulation enables producing previously unseen combinations of skills to reach temporally extended goals by "stitching" together latent skills through goal chaining with an order-of-magnitude improvement in performance upon state-of-the-art baselines. We even learn one multi-task visuomotor policy for 25 distinct manipulation tasks in the real world which outperforms both imitation learning and offline reinforcement learning techniques.

* CoRL 2022. Project website: http://tacorl.cs.uni-freiburg.de/

Via

Access Paper or Ask Questions

Robust Reinforcement Learning in Continuous Control Tasks with Uncertainty Set Regularization

Jul 05, 2022

Yuan Zhang, Jianhong Wang, Joschka Boedecker

Figure 1 for Robust Reinforcement Learning in Continuous Control Tasks with Uncertainty Set Regularization

Figure 2 for Robust Reinforcement Learning in Continuous Control Tasks with Uncertainty Set Regularization

Figure 3 for Robust Reinforcement Learning in Continuous Control Tasks with Uncertainty Set Regularization

Figure 4 for Robust Reinforcement Learning in Continuous Control Tasks with Uncertainty Set Regularization

Abstract:Reinforcement learning (RL) is recognized as lacking generalization and robustness under environmental perturbations, which excessively restricts its application for real-world robotics. Prior work claimed that adding regularization to the value function is equivalent to learning a robust policy with uncertain transitions. Although the regularization-robustness transformation is appealing for its simplicity and efficiency, it is still lacking in continuous control tasks. In this paper, we propose a new regularizer named $\textbf{U}$ncertainty $\textbf{S}$et $\textbf{R}$egularizer (USR), by formulating the uncertainty set on the parameter space of the transition function. In particular, USR is flexible enough to be plugged into any existing RL framework. To deal with unknown uncertainty sets, we further propose a novel adversarial approach to generate them based on the value function. We evaluate USR on the Real-world Reinforcement Learning (RWRL) benchmark, demonstrating improvements in the robust performance for perturbed testing environments.

Via

Access Paper or Ask Questions

Optimizing Trajectories for Highway Driving with Offline Reinforcement Learning

Mar 21, 2022

Branka Mirchevska, Moritz Werling, Joschka Boedecker

Figure 1 for Optimizing Trajectories for Highway Driving with Offline Reinforcement Learning

Figure 2 for Optimizing Trajectories for Highway Driving with Offline Reinforcement Learning

Figure 3 for Optimizing Trajectories for Highway Driving with Offline Reinforcement Learning

Figure 4 for Optimizing Trajectories for Highway Driving with Offline Reinforcement Learning

Abstract:Implementing an autonomous vehicle that is able to output feasible, smooth and efficient trajectories is a long-standing challenge. Several approaches have been considered, roughly falling under two categories: rule-based and learning-based approaches. The rule-based approaches, while guaranteeing safety and feasibility, fall short when it comes to long-term planning and generalization. The learning-based approaches are able to account for long-term planning and generalization to unseen situations, but may fail to achieve smoothness, safety and the feasibility which rule-based approaches ensure. Hence, combining the two approaches is an evident step towards yielding the best compromise out of both. We propose a Reinforcement Learning-based approach, which learns target trajectory parameters for fully autonomous driving on highways. The trained agent outputs continuous trajectory parameters based on which a feasible polynomial-based trajectory is generated and executed. We compare the performance of our agent against four other highway driving agents. The experiments are conducted in the Sumo simulator, taking into consideration various realistic, dynamically changing highway scenarios, including surrounding vehicles with different driver behaviors. We demonstrate that our offline trained agent, with randomly collected data, learns to drive smoothly, achieving velocities as close as possible to the desired velocity, while outperforming the other agents. Code, training data and details available at: https://nrgit.informatik.uni-freiburg. de/branka.mirchevska/offline-rl-tp.

Via

Access Paper or Ask Questions

Affordance Learning from Play for Sample-Efficient Policy Learning

Mar 01, 2022

Jessica Borja-Diaz, Oier Mees, Gabriel Kalweit, Lukas Hermann, Joschka Boedecker, Wolfram Burgard

Figure 1 for Affordance Learning from Play for Sample-Efficient Policy Learning

Figure 2 for Affordance Learning from Play for Sample-Efficient Policy Learning

Figure 3 for Affordance Learning from Play for Sample-Efficient Policy Learning

Figure 4 for Affordance Learning from Play for Sample-Efficient Policy Learning

Abstract:Robots operating in human-centered environments should have the ability to understand how objects function: what can be done with each object, where this interaction may occur, and how the object is used to achieve a goal. To this end, we propose a novel approach that extracts a self-supervised visual affordance model from human teleoperated play data and leverages it to enable efficient policy learning and motion planning. We combine model-based planning with model-free deep reinforcement learning (RL) to learn policies that favor the same object regions favored by people, while requiring minimal robot interactions with the environment. We evaluate our algorithm, Visual Affordance-guided Policy Optimization (VAPO), with both diverse simulation manipulation tasks and real world robot tidy-up experiments to demonstrate the effectiveness of our affordance-guided policies. We find that our policies train 4x faster than the baselines and generalize better to novel objects because our visual affordance model can anticipate their affordance regions.

* Accepted at the 2022 IEEE International Conference on Robotics and Automation (ICRA). Videos at http://vapo.cs.uni-freiburg.de/

Via

Access Paper or Ask Questions

Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

Nov 24, 2021

Nicolai Dorka, Joschka Boedecker, Wolfram Burgard

Figure 1 for Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

Figure 2 for Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

Figure 3 for Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

Figure 4 for Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning

Abstract:Accurate value estimates are important for off-policy reinforcement learning. Algorithms based on temporal difference learning typically are prone to an over- or underestimation bias building up over time. In this paper, we propose a general method called Adaptively Calibrated Critics (ACC) that uses the most recent high variance but unbiased on-policy rollouts to alleviate the bias of the low variance temporal difference targets. We apply ACC to Truncated Quantile Critics, which is an algorithm for continuous control that allows regulation of the bias with a hyperparameter tuned per environment. The resulting algorithm adaptively adjusts the parameter during training rendering hyperparameter search unnecessary and sets a new state of the art on the OpenAI gym continuous control benchmark among all algorithms that do not tune hyperparameters for each environment. Additionally, we demonstrate that ACC is quite general by further applying it to TD3 and showing an improved performance also in this setting.

Via

Access Paper or Ask Questions