Striking a balance between integration and modularity is crucial for a machine learning library to be versatile and user-friendly, especially in handling decision and control tasks that involve large development teams and complex, real-world data, and environments. To address this issue, we propose TorchRL, a generalistic control library for PyTorch that provides well-integrated, yet standalone components. With a versatile and robust primitive design, TorchRL facilitates streamlined algorithm development across the many branches of Reinforcement Learning (RL) and control. We introduce a new PyTorch primitive, TensorDict, as a flexible data carrier that empowers the integration of the library's components while preserving their modularity. Hence replay buffers, datasets, distributed data collectors, environments, transforms and objectives can be effortlessly used in isolation or combined. We provide a detailed description of the building blocks, supporting code examples and an extensive overview of the library across domains and tasks. Finally, we show comparative benchmarks to demonstrate its computational efficiency. TorchRL fosters long-term support and is publicly available on GitHub for greater reproducibility and collaboration within the research community. The code is opensourced on https://github.com/pytorch/rl.
Reinforcement learning (RL) has been very successful in recent years but, limited by its sample inefficiency, often requires large computational resources. While new methods are being investigated to increase the efficiency of RL algorithms it is critical to enable training at scale, yet using a code-base flexible enough to allow for method experimentation. Here, we present NAPPO, a pytorch-based library for RL which provides scalable proximal policy optimization (PPO) implementations in a simple, modular package. We validate it by replicating previous results on Mujoco and Atari environments. Furthermore, we provide insights on how a variety of distributed training schemes with synchronous and asynchronous communication patterns perform. Finally we showcase NAPPO by obtaining the highest to-date test performance on the Obstacle Tower Unity3D challenge environment. The full source code is available.