Can we pre-train a generalist agent from a large amount of unlabeled offline trajectories such that it can be immediately adapted to any new downstream tasks in a zero-shot manner? In this work, we present a functional reward encoding (FRE) as a general, scalable solution to this zero-shot RL problem. Our main idea is to learn functional representations of any arbitrary tasks by encoding their state-reward samples using a transformer-based variational auto-encoder. This functional encoding not only enables the pre-training of an agent from a wide diversity of general unsupervised reward functions, but also provides a way to solve any new downstream tasks in a zero-shot manner, given a small number of reward-annotated samples. We empirically show that FRE agents trained on diverse random unsupervised reward functions can generalize to solve novel tasks in a range of simulated robotic benchmarks, often outperforming previous zero-shot RL and offline RL methods. Code for this project is provided at: https://github.com/kvfrans/fre
One of the grand challenges of reinforcement learning is the ability to generalize to new tasks. However, general agents require a set of rich, diverse tasks to train on. Designing a `foundation environment' for such tasks is tricky -- the ideal environment would support a range of emergent phenomena, an expressive task space, and fast runtime. To take a step towards addressing this research bottleneck, this work presents Powderworld, a lightweight yet expressive simulation environment running directly on the GPU. Within Powderworld, two motivating challenges distributions are presented, one for world-modelling and one for reinforcement learning. Each contains hand-designed test tasks to examine generalization. Experiments indicate that increasing the environment's complexity improves generalization for world models and certain reinforcement learning agents, yet may inhibit learning in high-variance environments. Powderworld aims to support the study of generalization by providing a source of diverse tasks arising from the same core rules.
This work presents CLIPDraw, an algorithm that synthesizes novel drawings based on natural language input. CLIPDraw does not require any training; rather a pre-trained CLIP language-image encoder is used as a metric for maximizing similarity between the given description and a generated drawing. Crucially, CLIPDraw operates over vector strokes rather than pixel images, a constraint that biases drawings towards simpler human-recognizable shapes. Results compare between CLIPDraw and other synthesis-through-optimization methods, as well as highlight various interesting behaviors of CLIPDraw, such as satisfying ambiguous text in multiple ways, reliably producing drawings in diverse artistic styles, and scaling from simple to complex visual representations as stroke count is increased. Code for experimenting with the method is available at: https://colab.research.google.com/github/kvfrans/clipdraw/blob/main/clipdraw.ipynb
Inspired by natural evolution, evolutionary search algorithms have proven remarkably capable due to their dual abilities to radiantly explore through diverse populations and to converge to adaptive pressures. A large part of this behavior comes from the selection function of an evolutionary algorithm, which is a metric for deciding which individuals survive to the next generation. In deceptive or hard-to-search fitness landscapes, greedy selection often fails, thus it is critical that selection functions strike the correct balance between gradient-exploiting adaptation and exploratory diversification. This paper introduces Sel4Sel, or Selecting for Selection, an algorithm that searches for high-performing neural-network-based selection functions through a meta-evolutionary loop. Results on three distinct bitstring domains indicate that Sel4Sel networks consistently match or exceed the performance of both fitness-based selection and benchmarks explicitly designed to encourage diversity. Analysis of the strongest Sel4Sel networks reveals a general tendency to favor highly novel individuals early on, with a gradual shift towards fitness-based selection as deceptive local optima are bypassed.
Meta-learning models, or models that learn to learn, have been a long-desired target for their ability to quickly solve new tasks. Traditional meta-learning methods can require expensive inner and outer loops, thus there is demand for algorithms that discover strong learners without explicitly searching for them. We draw parallels to the study of evolvable genomes in evolutionary systems -- genomes with a strong capacity to adapt -- and propose that meta-learning and adaptive evolvability optimize for the same objective: high performance after a set of learning iterations. We argue that population-based evolutionary systems with non-static fitness landscapes naturally bias towards high-evolvability genomes, and therefore optimize for populations with strong learning ability. We demonstrate this claim with a simple evolutionary algorithm, Population-Based Meta Learning (PBML), that consistently discovers genomes which display higher rates of improvement over generations, and can rapidly adapt to solve sparse fitness and robotic control tasks.
Encoding images as a series of high-level constructs, such as brush strokes or discrete shapes, can often be key to both human and machine understanding. In many cases, however, data is only available in pixel form. We present a method for generating images directly in a high-level domain (e.g. brush strokes), without the need for real pairwise data. Specifically, we train a "canvas" network to imitate the mapping of high-level constructs to pixels, followed by a high-level "drawing" network which is optimized through this mapping towards solving a desired image recreation or translation task. We successfully discover sequential vector representations of symbols, large sketches, and 3D objects, utilizing only pixel data. We display applications of our method in image segmentation, and present several ablation studies comparing various configurations.
We develop a metalearning approach for learning hierarchically structured policies, improving sample efficiency on unseen tasks through the use of shared primitives---policies that are executed for large numbers of timesteps. Specifically, a set of primitives are shared within a distribution of tasks, and are switched between by task-specific policies. We provide a concrete metric for measuring the strength of such hierarchies, leading to an optimization problem for quickly reaching high reward on unseen tasks. We then present an algorithm to solve this problem end-to-end through the use of any off-the-shelf reinforcement learning method, by repeatedly sampling new tasks and resetting task-specific policies. We successfully discover meaningful motor primitives for the directional movement of four-legged robots, solely by interacting with distributions of mazes. We also demonstrate the transferability of primitives to solve long-timescale sparse-reward obstacle courses, and we enable 3D humanoid robots to robustly walk and crawl with the same policy.
When creating digital art, coloring and shading are often time consuming tasks that follow the same general patterns. A solution to automatically colorize raw line art would have many practical applications. We propose a setup utilizing two networks in tandem: a color prediction network based only on outlines, and a shading network conditioned on both outlines and a color scheme. We present processing methods to limit information passed in the color scheme, improving generalization. Finally, we demonstrate natural-looking results when colorizing outlines from scratch, as well as from a messy, user-defined color scheme.