Alert button
Picture for Raghu Rajan

Raghu Rajan

Alert button

T3VIP: Transformation-based 3D Video Prediction

Sep 19, 2022
Iman Nematollahi, Erick Rosete-Beas, Seyed Mahdi B. Azad, Raghu Rajan, Frank Hutter, Wolfram Burgard

Figure 1 for T3VIP: Transformation-based 3D Video Prediction
Figure 2 for T3VIP: Transformation-based 3D Video Prediction
Figure 3 for T3VIP: Transformation-based 3D Video Prediction
Figure 4 for T3VIP: Transformation-based 3D Video Prediction

For autonomous skill acquisition, robots have to learn about the physical rules governing the 3D world dynamics from their own past experience to predict and reason about plausible future outcomes. To this end, we propose a transformation-based 3D video prediction (T3VIP) approach that explicitly models the 3D motion by decomposing a scene into its object parts and predicting their corresponding rigid transformations. Our model is fully unsupervised, captures the stochastic nature of the real world, and the observational cues in image and point cloud domains constitute its learning signals. To fully leverage all the 2D and 3D observational signals, we equip our model with automatic hyperparameter optimization (HPO) to interpret the best way of learning from them. To the best of our knowledge, our model is the first generative model that provides an RGB-D video prediction of the future for a static camera. Our extensive evaluation with simulated and real-world datasets demonstrates that our formulation leads to interpretable 3D models that predict future depth videos while achieving on-par performance with 2D models on RGB video prediction. Moreover, we demonstrate that our model outperforms 2D baselines on visuomotor control. Videos, code, dataset, and pre-trained models are available at http://t3vip.cs.uni-freiburg.de.

* Accepted at the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 
Viaarxiv icon

Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

Jan 11, 2022
Jack Parker-Holder, Raghu Rajan, Xingyou Song, André Biedenkapp, Yingjie Miao, Theresa Eimer, Baohe Zhang, Vu Nguyen, Roberto Calandra, Aleksandra Faust, Frank Hutter, Marius Lindauer

Figure 1 for Automated Reinforcement Learning (AutoRL): A Survey and Open Problems
Figure 2 for Automated Reinforcement Learning (AutoRL): A Survey and Open Problems
Figure 3 for Automated Reinforcement Learning (AutoRL): A Survey and Open Problems
Figure 4 for Automated Reinforcement Learning (AutoRL): A Survey and Open Problems

The combination of Reinforcement Learning (RL) with deep learning has led to a series of impressive feats, with many believing (deep) RL provides a path towards generally capable agents. However, the success of RL agents is often highly sensitive to design choices in the training process, which may require tedious and error-prone manual tuning. This makes it challenging to use RL for new problems, while also limits its full potential. In many other areas of machine learning, AutoML has shown it is possible to automate such design choices and has also yielded promising initial results when applied to RL. However, Automated Reinforcement Learning (AutoRL) involves not only standard applications of AutoML but also includes additional challenges unique to RL, that naturally produce a different set of methods. As such, AutoRL has been emerging as an important area of research in RL, providing promise in a variety of applications from RNA design to playing games such as Go. Given the diversity of methods and environments considered in RL, much of the research has been conducted in distinct subfields, ranging from meta-learning to evolution. In this survey we seek to unify the field of AutoRL, we provide a common taxonomy, discuss each area in detail and pose open problems which would be of interest to researchers going forward.

Viaarxiv icon

TempoRL: Learning When to Act

Jun 09, 2021
André Biedenkapp, Raghu Rajan, Frank Hutter, Marius Lindauer

Figure 1 for TempoRL: Learning When to Act
Figure 2 for TempoRL: Learning When to Act
Figure 3 for TempoRL: Learning When to Act
Figure 4 for TempoRL: Learning When to Act

Reinforcement learning is a powerful approach to learn behaviour through interactions with an environment. However, behaviours are usually learned in a purely reactive fashion, where an appropriate action is selected based on an observation. In this form, it is challenging to learn when it is necessary to execute new decisions. This makes learning inefficient, especially in environments that need various degrees of fine and coarse control. To address this, we propose a proactive setting in which the agent not only selects an action in a state but also for how long to commit to that action. Our TempoRL approach introduces skip connections between states and learns a skip-policy for repeating the same action along these skips. We demonstrate the effectiveness of TempoRL on a variety of traditional and deep RL environments, showing that our approach is capable of learning successful policies up to an order of magnitude faster than vanilla Q-learning.

* Accepted at ICML'21 
Viaarxiv icon

On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning

Feb 26, 2021
Baohe Zhang, Raghu Rajan, Luis Pineda, Nathan Lambert, André Biedenkapp, Kurtland Chua, Frank Hutter, Roberto Calandra

Figure 1 for On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning
Figure 2 for On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning
Figure 3 for On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning
Figure 4 for On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning

Model-based Reinforcement Learning (MBRL) is a promising framework for learning control in a data-efficient manner. MBRL algorithms can be fairly complex due to the separate dynamics modeling and the subsequent planning algorithm, and as a result, they often possess tens of hyperparameters and architectural choices. For this reason, MBRL typically requires significant human expertise before it can be applied to new problems and domains. To alleviate this problem, we propose to use automatic hyperparameter optimization (HPO). We demonstrate that this problem can be tackled effectively with automated HPO, which we demonstrate to yield significantly improved performance compared to human experts. In addition, we show that tuning of several MBRL hyperparameters dynamically, i.e. during the training itself, further improves the performance compared to using static hyperparameters which are kept fixed for the whole training. Finally, our experiments provide valuable insights into the effects of several hyperparameters, such as plan horizon or learning rate and their influence on the stability of training and resulting rewards.

* 19 pages, accepted by AISTATS 2021 
Viaarxiv icon

!MDP Playground: Meta-Features in Reinforcement Learning

Sep 17, 2019
Raghu Rajan, Frank Hutter

Figure 1 for !MDP Playground: Meta-Features in Reinforcement Learning
Figure 2 for !MDP Playground: Meta-Features in Reinforcement Learning
Figure 3 for !MDP Playground: Meta-Features in Reinforcement Learning
Figure 4 for !MDP Playground: Meta-Features in Reinforcement Learning

Reinforcement Learning (RL) algorithms usually assume their environment to be a Markov Decision Process (MDP). Additionally, they do not try to identify specific features of environments which could help them perform better. Here, we present a few key meta-features of environments: delayed rewards, specific reward sequences, sparsity of rewards, and stochasticity of environments, which may violate the MDP assumptions and adapting to which should help RL agents perform better. While it is very time consuming to run RL algorithms on standard benchmarks, we define a parameterised collection of fast-to-run toy benchmarks in OpenAI Gym by varying these meta-features. Despite their toy nature and low compute requirements, we show that these benchmarks present substantial difficulties to current RL algorithms. Furthermore, since we can generate environments with a desired value for each of the meta-features, we have fine-grained control over the environments' difficulty and also have the ground truth available for evaluating algorithms. We believe that devising algorithms that can detect such meta-features of environments and adapt to them will be key to creating robust RL algorithms that work in a variety of different real-world problems.

* Submitted to NIPS Deep RL Workshop 
Viaarxiv icon