Alert button
Picture for Vincent Moens

Vincent Moens

Alert button

RoboHive: A Unified Framework for Robot Learning

Oct 10, 2023
Vikash Kumar, Rutav Shah, Gaoyue Zhou, Vincent Moens, Vittorio Caggiano, Jay Vakil, Abhishek Gupta, Aravind Rajeswaran

Figure 1 for RoboHive: A Unified Framework for Robot Learning
Figure 2 for RoboHive: A Unified Framework for Robot Learning
Figure 3 for RoboHive: A Unified Framework for Robot Learning
Figure 4 for RoboHive: A Unified Framework for Robot Learning

We present RoboHive, a comprehensive software platform and ecosystem for research in the field of Robot Learning and Embodied Artificial Intelligence. Our platform encompasses a diverse range of pre-existing and novel environments, including dexterous manipulation with the Shadow Hand, whole-arm manipulation tasks with Franka and Fetch robots, quadruped locomotion, among others. Included environments are organized within and cover multiple domains such as hand manipulation, locomotion, multi-task, multi-agent, muscles, etc. In comparison to prior works, RoboHive offers a streamlined and unified task interface taking dependency on only a minimal set of well-maintained packages, features tasks with high physics fidelity and rich visual diversity, and supports common hardware drivers for real-world deployment. The unified interface of RoboHive offers a convenient and accessible abstraction for algorithmic research in imitation, reinforcement, multi-task, and hierarchical learning. Furthermore, RoboHive includes expert demonstrations and baseline results for most environments, providing a standard for benchmarking and comparisons. Details: https://sites.google.com/view/robohive

* Accepted at 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks 
Viaarxiv icon

TorchRL: A data-driven decision-making library for PyTorch

Jun 01, 2023
Albert Bou, Matteo Bettini, Sebastian Dittert, Vikash Kumar, Shagun Sodhani, Xiaomeng Yang, Gianni De Fabritiis, Vincent Moens

Figure 1 for TorchRL: A data-driven decision-making library for PyTorch
Figure 2 for TorchRL: A data-driven decision-making library for PyTorch
Figure 3 for TorchRL: A data-driven decision-making library for PyTorch
Figure 4 for TorchRL: A data-driven decision-making library for PyTorch

Striking a balance between integration and modularity is crucial for a machine learning library to be versatile and user-friendly, especially in handling decision and control tasks that involve large development teams and complex, real-world data, and environments. To address this issue, we propose TorchRL, a generalistic control library for PyTorch that provides well-integrated, yet standalone components. With a versatile and robust primitive design, TorchRL facilitates streamlined algorithm development across the many branches of Reinforcement Learning (RL) and control. We introduce a new PyTorch primitive, TensorDict, as a flexible data carrier that empowers the integration of the library's components while preserving their modularity. Hence replay buffers, datasets, distributed data collectors, environments, transforms and objectives can be effortlessly used in isolation or combined. We provide a detailed description of the building blocks, supporting code examples and an extensive overview of the library across domains and tasks. Finally, we show comparative benchmarks to demonstrate its computational efficiency. TorchRL fosters long-term support and is publicly available on GitHub for greater reproducibility and collaboration within the research community. The code is opensourced on https://github.com/pytorch/rl.

Viaarxiv icon

CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation Learning

Dec 12, 2022
Zhao Mandi, Homanga Bharadhwaj, Vincent Moens, Shuran Song, Aravind Rajeswaran, Vikash Kumar

Figure 1 for CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation Learning
Figure 2 for CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation Learning
Figure 3 for CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation Learning
Figure 4 for CACTI: A Framework for Scalable Multi-Task Multi-Scene Visual Imitation Learning

Developing robots that are capable of many skills and generalization to unseen scenarios requires progress on two fronts: efficient collection of large and diverse datasets, and training of high-capacity policies on the collected data. While large datasets have propelled progress in other fields like computer vision and natural language processing, collecting data of comparable scale is particularly challenging for physical systems like robotics. In this work, we propose a framework to bridge this gap and better scale up robot learning, under the lens of multi-task, multi-scene robot manipulation in kitchen environments. Our framework, named CACTI, has four stages that separately handle data collection, data augmentation, visual representation learning, and imitation policy training. In the CACTI framework, we highlight the benefit of adapting state-of-the-art models for image generation as part of the augmentation stage, and the significant improvement of training efficiency by using pretrained out-of-domain visual representations at the compression stage. Experimentally, we demonstrate that 1) on a real robot setup, CACTI enables efficient training of a single policy capable of 10 manipulation tasks involving kitchen objects, and robust to varying layouts of distractor objects; 2) in a simulated kitchen environment, CACTI trains a single policy on 18 semantic tasks across up to 50 layout variations per task. The simulation task benchmark and augmented datasets in both real and simulated environments will be released to facilitate future research.

Viaarxiv icon

Implicit Variational Conditional Sampling with Normalizing Flows

Jul 06, 2021
Vincent Moens, Aivar Sootla, Haitham Bou Ammar, Jun Wang

Figure 1 for Implicit Variational Conditional Sampling with Normalizing Flows
Figure 2 for Implicit Variational Conditional Sampling with Normalizing Flows
Figure 3 for Implicit Variational Conditional Sampling with Normalizing Flows

We present a method for conditional sampling with normalizing flows when only part of an observation is available. We rely on the following fact: if the flow's domain can be partitioned in such a way that the flow restrictions to subdomains keep the bijectivity property, a lower bound to the conditioning variable log-probability can be derived. Simulation from the variational conditional flow then amends to solving an equality constraint. Our contribution is three-fold: a) we provide detailed insights on the choice of variational distributions; b) we propose how to partition the input space of the flow to preserve bijectivity property; c) we propose a set of methods to optimise the variational distribution in specific cases. Through extensive experiments, we show that our sampling method can be applied with success to invertible residual networks for inference and classification.

Viaarxiv icon

Efficient Semi-Implicit Variational Inference

Jan 15, 2021
Vincent Moens, Hang Ren, Alexandre Maraval, Rasul Tutunov, Jun Wang, Haitham Ammar

Figure 1 for Efficient Semi-Implicit Variational Inference
Figure 2 for Efficient Semi-Implicit Variational Inference
Figure 3 for Efficient Semi-Implicit Variational Inference
Figure 4 for Efficient Semi-Implicit Variational Inference

In this paper, we propose CI-VI an efficient and scalable solver for semi-implicit variational inference (SIVI). Our method, first, maps SIVI's evidence lower bound (ELBO) to a form involving a nonlinear functional nesting of expected values and then develops a rigorous optimiser capable of correctly handling bias inherent to nonlinear nested expectations using an extrapolation-smoothing mechanism coupled with gradient sketching. Our theoretical results demonstrate convergence to a stationary point of the ELBO in general non-convex settings typically arising when using deep network models and an order of $O(t^{-\frac{4}{5}})$ gradient-bias-vanishing rate. We believe these results generalise beyond the specific nesting arising from SIVI to other forms. Finally, in a set of experiments, we demonstrate the effectiveness of our algorithm in approximating complex posteriors on various data-sets including those from natural language processing.

Viaarxiv icon

SAMBA: Safe Model-Based & Active Reinforcement Learning

Jun 12, 2020
Alexander I. Cowen-Rivers, Daniel Palenicek, Vincent Moens, Mohammed Abdullah, Aivar Sootla, Jun Wang, Haitham Ammar

Figure 1 for SAMBA: Safe Model-Based & Active Reinforcement Learning
Figure 2 for SAMBA: Safe Model-Based & Active Reinforcement Learning
Figure 3 for SAMBA: Safe Model-Based & Active Reinforcement Learning
Figure 4 for SAMBA: Safe Model-Based & Active Reinforcement Learning

In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel(semi-)metrics for out-of-sample Gaussian process evaluation optimised through a multi-objective problem that supports conditional-value-at-risk constraints. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. Our results show orders of magnitude reductions in samples and violations compared to state-of-the-art methods. Lastly, we provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.

Viaarxiv icon

$¶$ILCRO: Making Importance Landscapes Flat Again

Feb 06, 2020
Vincent Moens, Simiao Yu, Gholamreza Salimi-Khorshidi

Figure 1 for $¶$ILCRO: Making Importance Landscapes Flat Again
Figure 2 for $¶$ILCRO: Making Importance Landscapes Flat Again
Figure 3 for $¶$ILCRO: Making Importance Landscapes Flat Again
Figure 4 for $¶$ILCRO: Making Importance Landscapes Flat Again

Convolutional neural networks have had a great success in numerous tasks, including image classification, object detection, sequence modelling, and many more. It is generally assumed that such neural networks are translation invariant, meaning that they can detect a given feature independent of its location in the input image. While this is true for simple cases, where networks are composed of a restricted number of layer classes and where images are fairly simple, complex images with common state-of-the-art networks do not usually enjoy this property as one might hope. This paper shows that most of the existing convolutional architectures define, at initialisation, a specific feature importance landscape that conditions their capacity to attend to different locations of the images later during training or even at test time. We demonstrate how this phenomenon occurs under specific conditions and how it can be adjusted under some assumptions. We derive the P-objective, or PILCRO for Pixel-wise Importance Landscape Curvature Regularised Objective, a simple regularisation technique that favours weight configurations that produce smooth, low-curvature importance landscapes that are conditioned on the data and not on the chosen architecture. Through extensive experiments, we further show that P-regularised versions of popular computer vision networks have a flat importance landscape, train faster, result in a better accuracy and are more robust to noise at test time, when compared to their original counterparts in common computer-vision classification settings.

Viaarxiv icon

The Hierarchical Adaptive Forgetting Variational Filter

May 15, 2018
Vincent Moens

Figure 1 for The Hierarchical Adaptive Forgetting Variational Filter
Figure 2 for The Hierarchical Adaptive Forgetting Variational Filter
Figure 3 for The Hierarchical Adaptive Forgetting Variational Filter
Figure 4 for The Hierarchical Adaptive Forgetting Variational Filter

A common problem in Machine Learning and statistics consists in detecting whether the current sample in a stream of data belongs to the same distribution as previous ones, is an isolated outlier or inaugurates a new distribution of data. We present a hierarchical Bayesian algorithm that aims at learning a time-specific approximate posterior distribution of the parameters describing the distribution of the data observed. We derive the update equations of the variational parameters of the approximate posterior at each time step for models from the exponential family, and show that these updates find interesting correspondents in Reinforcement Learning (RL). In this perspective, our model can be seen as a hierarchical RL algorithm that learns a posterior distribution according to a certain stability confidence that is, in turn, learned according to its own stability confidence. Finally, we show some applications of our generic model, first in a RL context, next with an adaptive Bayesian Autoregressive model, and finally in the context of Stochastic Gradient Descent optimization.

Viaarxiv icon