Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Olivier Bachem

Google Research

Brax -- A Differentiable Physics Engine for Large Scale Rigid Body Simulation

Jun 24, 2021

C. Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, Olivier Bachem

Figure 1 for Brax -- A Differentiable Physics Engine for Large Scale Rigid Body Simulation

Figure 2 for Brax -- A Differentiable Physics Engine for Large Scale Rigid Body Simulation

Figure 3 for Brax -- A Differentiable Physics Engine for Large Scale Rigid Body Simulation

Figure 4 for Brax -- A Differentiable Physics Engine for Large Scale Rigid Body Simulation

Abstract:We present Brax, an open source library for rigid body simulation with a focus on performance and parallelism on accelerators, written in JAX. We present results on a suite of tasks inspired by the existing reinforcement learning literature, but remade in our engine. Additionally, we provide reimplementations of PPO, SAC, ES, and direct policy optimization in JAX that compile alongside our environments, allowing the learning algorithm and the environment processing to occur on the same device, and to scale seamlessly on accelerators. Finally, we include notebooks that facilitate training of performant policies on common OpenAI Gym MuJoCo-like tasks in minutes.

* 9 pages + 12 pages of appendices and references. In submission at NeurIPS 2021 Datasets and Benchmarks Track

Via

Access Paper or Ask Questions

Offline Reinforcement Learning as Anti-Exploration

Jun 11, 2021

Shideh Rezaeifar, Robert Dadashi, Nino Vieillard, Léonard Hussenot, Olivier Bachem, Olivier Pietquin, Matthieu Geist

Figure 1 for Offline Reinforcement Learning as Anti-Exploration

Figure 2 for Offline Reinforcement Learning as Anti-Exploration

Figure 3 for Offline Reinforcement Learning as Anti-Exploration

Figure 4 for Offline Reinforcement Learning as Anti-Exploration

Abstract:Offline Reinforcement Learning (RL) aims at learning an optimal control from a fixed dataset, without interactions with the system. An agent in this setting should avoid selecting actions whose consequences cannot be predicted from the data. This is the converse of exploration in RL, which favors such actions. We thus take inspiration from the literature on bonus-based exploration to design a new offline RL agent. The core idea is to subtract a prediction-based exploration bonus from the reward, instead of adding it for exploration. This allows the policy to stay close to the support of the dataset. We connect this approach to a more common regularization of the learned policy towards the data. Instantiated with a bonus based on the prediction error of a variational autoencoder, we show that our agent is competitive with the state of the art on a set of continuous control locomotion and manipulation tasks.

Via

Access Paper or Ask Questions

Concave Utility Reinforcement Learning: the Mean-field Game viewpoint

Jun 09, 2021

Matthieu Geist, Julien Pérolat, Mathieu Laurière, Romuald Elie, Sarah Perrin, Olivier Bachem, Rémi Munos, Olivier Pietquin

Figure 1 for Concave Utility Reinforcement Learning: the Mean-field Game viewpoint

Figure 2 for Concave Utility Reinforcement Learning: the Mean-field Game viewpoint

Figure 3 for Concave Utility Reinforcement Learning: the Mean-field Game viewpoint

Figure 4 for Concave Utility Reinforcement Learning: the Mean-field Game viewpoint

Abstract:Concave Utility Reinforcement Learning (CURL) extends RL from linear to concave utilities in the occupancy measure induced by the agent's policy. This encompasses not only RL but also imitation learning and exploration, among others. Yet, this more general paradigm invalidates the classical Bellman equations, and calls for new algorithms. Mean-field Games (MFGs) are a continuous approximation of many-agent RL. They consider the limit case of a continuous distribution of identical agents, anonymous with symmetric interests, and reduce the problem to the study of a single representative agent in interaction with the full population. Our core contribution consists in showing that CURL is a subclass of MFGs. We think this important to bridge together both communities. It also allows to shed light on aspects of both fields: we show the equivalence between concavity in CURL and monotonicity in the associated MFG, between optimality conditions in CURL and Nash equilibrium in MFG, or that Fictitious Play (FP) for this class of MFGs is simply Frank-Wolfe, bringing the first convergence rate for discrete-time FP for MFGs. We also experimentally demonstrate that, using algorithms recently introduced for solving MFGs, we can address the CURL problem more efficiently.

Via

Access Paper or Ask Questions

What Matters for Adversarial Imitation Learning?

Jun 01, 2021

Manu Orsini, Anton Raichuk, Léonard Hussenot, Damien Vincent, Robert Dadashi, Sertan Girgin, Matthieu Geist, Olivier Bachem, Olivier Pietquin, Marcin Andrychowicz

Figure 1 for What Matters for Adversarial Imitation Learning?

Figure 2 for What Matters for Adversarial Imitation Learning?

Figure 3 for What Matters for Adversarial Imitation Learning?

Figure 4 for What Matters for Adversarial Imitation Learning?

Abstract:Adversarial imitation learning has become a popular framework for imitation in continuous control. Over the years, several variations of its components were proposed to enhance the performance of the learned policies as well as the sample complexity of the algorithm. In practice, these choices are rarely tested all together in rigorous empirical studies. It is therefore difficult to discuss and understand what choices, among the high-level algorithmic options as well as low-level implementation details, matter. To tackle this issue, we implement more than 50 of these choices in a generic adversarial imitation learning framework and investigate their impacts in a large-scale study (>500k trained agents) with both synthetic and human-generated demonstrations. While many of our findings confirm common practices, some of them are surprising or even contradict prior work. In particular, our results suggest that artificial demonstrations are not a good proxy for human data and that the very common practice of evaluating imitation algorithms only with synthetic demonstrations may lead to algorithms which perform poorly in the more realistic scenarios with human demonstrations.

Via

Access Paper or Ask Questions

Hyperparameter Selection for Imitation Learning

May 25, 2021

Leonard Hussenot, Marcin Andrychowicz, Damien Vincent, Robert Dadashi, Anton Raichuk, Lukasz Stafiniak, Sertan Girgin, Raphael Marinier, Nikola Momchev, Sabela Ramos(+4 more)

Figure 1 for Hyperparameter Selection for Imitation Learning

Figure 2 for Hyperparameter Selection for Imitation Learning

Figure 3 for Hyperparameter Selection for Imitation Learning

Figure 4 for Hyperparameter Selection for Imitation Learning

Abstract:We address the issue of tuning hyperparameters (HPs) for imitation learning algorithms in the context of continuous-control, when the underlying reward function of the demonstrating expert cannot be observed at any time. The vast literature in imitation learning mostly considers this reward function to be available for HP selection, but this is not a realistic setting. Indeed, would this reward function be available, it could then directly be used for policy training and imitation would not be necessary. To tackle this mostly ignored problem, we propose a number of possible proxies to the external reward. We evaluate them in an extensive empirical study (more than 10'000 agents across 9 environments) and make practical recommendations for selecting HPs. Our results show that while imitation learning algorithms are sensitive to HP choices, it is often possible to select good enough HPs through a proxy to the reward function.

* ICML 2021

Via

Access Paper or Ask Questions

Scaling Hierarchical Agglomerative Clustering to Billion-sized Datasets

May 25, 2021

Baris Sumengen, Anand Rajagopalan, Gui Citovsky, David Simcha, Olivier Bachem, Pradipta Mitra, Sam Blasiak, Mason Liang, Sanjiv Kumar

Figure 1 for Scaling Hierarchical Agglomerative Clustering to Billion-sized Datasets

Figure 2 for Scaling Hierarchical Agglomerative Clustering to Billion-sized Datasets

Figure 3 for Scaling Hierarchical Agglomerative Clustering to Billion-sized Datasets

Figure 4 for Scaling Hierarchical Agglomerative Clustering to Billion-sized Datasets

Abstract:Hierarchical Agglomerative Clustering (HAC) is one of the oldest but still most widely used clustering methods. However, HAC is notoriously hard to scale to large data sets as the underlying complexity is at least quadratic in the number of data points and many algorithms to solve HAC are inherently sequential. In this paper, we propose {Reciprocal Agglomerative Clustering (RAC)}, a distributed algorithm for HAC, that uses a novel strategy to efficiently merge clusters in parallel. We prove theoretically that RAC recovers the exact solution of HAC. Furthermore, under clusterability and balancedness assumption we show provable speedups in total runtime due to the parallelism. We also show that these speedups are achievable for certain probabilistic data models. In extensive experiments, we show that this parallelism is achieved on real world data sets and that the proposed RAC algorithm can recover the HAC hierarchy on billions of data points connected by trillions of edges in less than an hour.

* 19 pages, 3 figures

Via

Access Paper or Ask Questions

A Sober Look at the Unsupervised Learning of Disentangled Representations and their Evaluation

Oct 27, 2020

Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf, Olivier Bachem

Figure 1 for A Sober Look at the Unsupervised Learning of Disentangled Representations and their Evaluation

Figure 2 for A Sober Look at the Unsupervised Learning of Disentangled Representations and their Evaluation

Figure 3 for A Sober Look at the Unsupervised Learning of Disentangled Representations and their Evaluation

Figure 4 for A Sober Look at the Unsupervised Learning of Disentangled Representations and their Evaluation

Abstract:The idea behind the \emph{unsupervised} learning of \emph{disentangled} representations is that real-world data is generated by a few explanatory factors of variation which can be recovered by unsupervised learning algorithms. In this paper, we provide a sober look at recent progress in the field and challenge some common assumptions. We first theoretically show that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data. Then, we train over $14000$ models covering most prominent methods and evaluation metrics in a reproducible large-scale experimental study on eight data sets. We observe that while the different methods successfully enforce properties "encouraged" by the corresponding losses, well-disentangled models seemingly cannot be identified without supervision. Furthermore, different evaluation metrics do not always agree on what should be considered "disentangled" and exhibit systematic differences in the estimation. Finally, increased disentanglement does not seem to necessarily lead to a decreased sample complexity of learning for downstream tasks. Our results suggest that future work on disentanglement learning should be explicit about the role of inductive biases and (implicit) supervision, investigate concrete benefits of enforcing disentanglement of the learned representations, and consider a reproducible experimental setup covering several data sets.

* Journal of Machine Learning Research 2020, Volume 21, Number 209
* arXiv admin note: substantial text overlap with arXiv:1811.12359

Via

Access Paper or Ask Questions

A Commentary on the Unsupervised Learning of Disentangled Representations

Jul 28, 2020

Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf, Olivier Bachem

Figure 1 for A Commentary on the Unsupervised Learning of Disentangled Representations

Figure 2 for A Commentary on the Unsupervised Learning of Disentangled Representations

Abstract:The goal of the unsupervised learning of disentangled representations is to separate the independent explanatory factors of variation in the data without access to supervision. In this paper, we summarize the results of Locatello et al., 2019, and focus on their implications for practitioners. We discuss the theoretical result showing that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases and the practical challenges it entails. Finally, we comment on our experimental findings, highlighting the limitations of state-of-the-art approaches and directions for future research.

* The Thirty-Fourth AAAI Conference on Artificial Intelligence 2020 (AAAI-20)

Via

Access Paper or Ask Questions

What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study

Jun 10, 2020

Marcin Andrychowicz, Anton Raichuk, Piotr Stańczyk, Manu Orsini, Sertan Girgin, Raphael Marinier, Léonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski(+2 more)

Figure 1 for What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study

Figure 2 for What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study

Figure 3 for What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study

Figure 4 for What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study

Abstract:In recent years, on-policy reinforcement learning (RL) has been successfully applied to many different continuous control tasks. While RL algorithms are often conceptually simple, their state-of-the-art implementations take numerous low- and high-level design decisions that strongly affect the performance of the resulting agents. Those choices are usually not extensively discussed in the literature, leading to discrepancy between published descriptions of algorithms and their implementations. This makes it hard to attribute progress in RL and slows down overall progress [Engstrom'20]. As a step towards filling that gap, we implement >50 such ``choices'' in a unified on-policy RL framework, allowing us to investigate their impact in a large-scale empirical study. We train over 250'000 agents in five continuous control environments of different complexity and provide insights and practical recommendations for on-policy training of RL agents.

Via

Access Paper or Ask Questions

Automatic Shortcut Removal for Self-Supervised Representation Learning

Feb 21, 2020

Matthias Minderer, Olivier Bachem, Neil Houlsby, Michael Tschannen

Figure 1 for Automatic Shortcut Removal for Self-Supervised Representation Learning

Figure 2 for Automatic Shortcut Removal for Self-Supervised Representation Learning

Figure 3 for Automatic Shortcut Removal for Self-Supervised Representation Learning

Figure 4 for Automatic Shortcut Removal for Self-Supervised Representation Learning

Abstract:In self-supervised visual representation learning, a feature extractor is trained on a "pretext task" for which labels can be generated cheaply. A central challenge in this approach is that the feature extractor quickly learns to exploit low-level visual features such as color aberrations or watermarks and then fails to learn useful semantic representations. Much work has gone into identifying such "shortcut" features and hand-designing schemes to reduce their effect. Here, we propose a general framework for removing shortcut features automatically. Our key assumption is that those features which are the first to be exploited for solving the pretext task may also be the most vulnerable to an adversary trained to make the task harder. We show that this assumption holds across common pretext tasks and datasets by training a "lens" network to make small image changes that maximally reduce performance in the pretext task. Representations learned with the modified images outperform those learned without in all tested cases. Additionally, the modifications made by the lens reveal how the choice of pretext task and dataset affects the features learned by self-supervision.

Via

Access Paper or Ask Questions