Alert button
Picture for Jesse Zhang

Jesse Zhang

Alert button

SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling

Jun 20, 2023
Jesse Zhang, Karl Pertsch, Jiahui Zhang, Joseph J. Lim

Figure 1 for SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling
Figure 2 for SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling
Figure 3 for SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling
Figure 4 for SPRINT: Scalable Policy Pre-Training via Language Instruction Relabeling

Pre-training robot policies with a rich set of skills can substantially accelerate the learning of downstream tasks. Prior works have defined pre-training tasks via natural language instructions, but doing so requires tedious human annotation of hundreds of thousands of instructions. Thus, we propose SPRINT, a scalable offline policy pre-training approach which substantially reduces the human effort needed for pre-training a diverse set of skills. Our method uses two core ideas to automatically expand a base set of pre-training tasks: instruction relabeling via large language models and cross-trajectory skill chaining through offline reinforcement learning. As a result, SPRINT pre-training equips robots with a much richer repertoire of skills. Experimental results in a household simulator and on a real robot kitchen manipulation task show that SPRINT leads to substantially faster learning of new long-horizon tasks than previous pre-training approaches. Website at https://clvrai.com/sprint.

* 29 pages, 18 figures 
Viaarxiv icon

Hierarchical Neural Program Synthesis

Mar 09, 2023
Linghan Zhong, Ryan Lindeborg, Jesse Zhang, Joseph J. Lim, Shao-Hua Sun

Figure 1 for Hierarchical Neural Program Synthesis
Figure 2 for Hierarchical Neural Program Synthesis
Figure 3 for Hierarchical Neural Program Synthesis
Figure 4 for Hierarchical Neural Program Synthesis

Program synthesis aims to automatically construct human-readable programs that satisfy given task specifications, such as input/output pairs or demonstrations. Recent works have demonstrated encouraging results in a variety of domains, such as string transformation, tensor manipulation, and describing behaviors of embodied agents. Most existing program synthesis methods are designed to synthesize programs from scratch, generating a program token by token, line by line. This fundamentally prevents these methods from scaling up to synthesize programs that are longer or more complex. In this work, we present a scalable program synthesis framework that instead synthesizes a program by hierarchically composing programs. Specifically, we first learn a task embedding space and a program decoder that can decode a task embedding into a program. Then, we train a high-level module to comprehend the task specification (e.g., input/output pairs or demonstrations) from long programs and produce a sequence of task embeddings, which are then decoded by the program decoder and composed to yield the synthesized program. We extensively evaluate our proposed framework in a string transformation domain with input/output pairs. The experimental results demonstrate that the proposed framework can synthesize programs that are significantly longer and more complex than the programs considered in prior program synthesis works. Website at https://thoughtp0lice.github.io/hnps_web/

Viaarxiv icon

Learning to Synthesize Programs as Interpretable and Generalizable Policies

Aug 31, 2021
Dweep Trivedi, Jesse Zhang, Shao-Hua Sun, Joseph J. Lim

Figure 1 for Learning to Synthesize Programs as Interpretable and Generalizable Policies
Figure 2 for Learning to Synthesize Programs as Interpretable and Generalizable Policies
Figure 3 for Learning to Synthesize Programs as Interpretable and Generalizable Policies
Figure 4 for Learning to Synthesize Programs as Interpretable and Generalizable Policies

Recently, deep reinforcement learning (DRL) methods have achieved impressive performance on tasks in a variety of domains. However, neural network policies produced with DRL methods are not human-interpretable and often have difficulty generalizing to novel scenarios. To address these issues, prior works explore learning programmatic policies that are more interpretable and structured for generalization. Yet, these works either employ limited policy representations (e.g. decision trees, state machines, or predefined program templates) or require stronger supervision (e.g. input/output state pairs or expert demonstrations). We present a framework that instead learns to synthesize a program, which details the procedure to solve a task in a flexible and expressive manner, solely from reward signals. To alleviate the difficulty of learning to compose programs to induce the desired agent behavior from scratch, we propose to first learn a program embedding space that continuously parameterizes diverse behaviors in an unsupervised manner and then search over the learned program embedding space to yield a program that maximizes the return for a given task. Experimental results demonstrate that the proposed framework not only learns to reliably synthesize task-solving programs but also outperforms DRL and program synthesis baselines while producing interpretable and more generalizable policies. We also justify the necessity of the proposed two-stage learning scheme as well as analyze various methods for learning the program embedding.

* 52 pages, 16 figures, 12 tables 
Viaarxiv icon

TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains

Apr 27, 2021
George Awad, Asad A. Butt, Keith Curtis, Jonathan Fiscus, Afzal Godil, Yooyoung Lee, Andrew Delgado, Jesse Zhang, Eliot Godard, Baptiste Chocot, Lukas Diduch, Jeffrey Liu, Alan F. Smeaton, Yvette Graham, Gareth J. F. Jones, Wessel Kraaij, Georges Quenot

Figure 1 for TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains
Figure 2 for TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains
Figure 3 for TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains
Figure 4 for TRECVID 2020: A comprehensive campaign for evaluating video retrieval tasks across multiple application domains

The TREC Video Retrieval Evaluation (TRECVID) is a TREC-style video analysis and retrieval evaluation with the goal of promoting progress in research and development of content-based exploitation and retrieval of information from digital video via open, metrics-based evaluation. Over the last twenty years this effort has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID has been funded by NIST (National Institute of Standards and Technology) and other US government agencies. In addition, many organizations and individuals worldwide contribute significant time and effort. TRECVID 2020 represented a continuation of four tasks and the addition of two new tasks. In total, 29 teams from various research organizations worldwide completed one or more of the following six tasks: 1. Ad-hoc Video Search (AVS), 2. Instance Search (INS), 3. Disaster Scene Description and Indexing (DSDI), 4. Video to Text Description (VTT), 5. Activities in Extended Video (ActEV), 6. Video Summarization (VSUM). This paper is an introduction to the evaluation framework, tasks, data, and measures used in the evaluation campaign.

* TRECVID 2020 Workshop Overview Paper. arXiv admin note: substantial text overlap with arXiv:2009.09984 
Viaarxiv icon

Hierarchical Reinforcement Learning By Discovering Intrinsic Options

Jan 16, 2021
Jesse Zhang, Haonan Yu, Wei Xu

Figure 1 for Hierarchical Reinforcement Learning By Discovering Intrinsic Options
Figure 2 for Hierarchical Reinforcement Learning By Discovering Intrinsic Options
Figure 3 for Hierarchical Reinforcement Learning By Discovering Intrinsic Options
Figure 4 for Hierarchical Reinforcement Learning By Discovering Intrinsic Options

We propose a hierarchical reinforcement learning method, HIDIO, that can learn task-agnostic options in a self-supervised manner while jointly learning to utilize them to solve sparse-reward tasks. Unlike current hierarchical RL approaches that tend to formulate goal-reaching low-level tasks or pre-define ad hoc lower-level policies, HIDIO encourages lower-level option learning that is independent of the task at hand, requiring few assumptions or little knowledge about the task structure. These options are learned through an intrinsic entropy minimization objective conditioned on the option sub-trajectories. The learned options are diverse and task-agnostic. In experiments on sparse-reward robotic manipulation and navigation tasks, HIDIO achieves higher success rates with greater sample efficiency than regular RL baselines and two state-of-the-art hierarchical RL methods.

* ICLR 2021. 19 pages, 9 figures. Code at https://www.github.com/jesbu1/hidio 
Viaarxiv icon

COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning

Oct 27, 2020
Avi Singh, Albert Yu, Jonathan Yang, Jesse Zhang, Aviral Kumar, Sergey Levine

Figure 1 for COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning
Figure 2 for COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning
Figure 3 for COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning
Figure 4 for COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning

Reinforcement learning has been applied to a wide variety of robotics problems, but most of such applications involve collecting data from scratch for each new task. Since the amount of robot data we can collect for any single task is limited by time and cost considerations, the learned behavior is typically narrow: the policy can only execute the task in a handful of scenarios that it was trained on. What if there was a way to incorporate a large amount of prior data, either from previously solved tasks or from unsupervised or undirected environment interaction, to extend and generalize learned behaviors? While most prior work on extending robotic skills using pre-collected data focuses on building explicit hierarchies or skill decompositions, we show in this paper that we can reuse prior data to extend new skills simply through dynamic programming. We show that even when the prior data does not actually succeed at solving the new task, it can still be utilized for learning a better policy, by providing the agent with a broader understanding of the mechanics of its environment. We demonstrate the effectiveness of our approach by chaining together several behaviors seen in prior datasets for solving a new task, with our hardest experimental setting involving composing four robotic skills in a row: picking, placing, drawer opening, and grasping, where a +1/0 sparse reward is provided only on task completion. We train our policies in an end-to-end fashion, mapping high-dimensional image observations to low-level robot control commands, and present results in both simulated and real world domains. Additional materials and source code can be found on our project website: https://sites.google.com/view/cog-rl

* Accepted to CoRL 2020. Source code and videos available at https://sites.google.com/view/cog-rl 
Viaarxiv icon

TRECVID 2019: An Evaluation Campaign to Benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & Retrieval

Sep 21, 2020
George Awad, Asad A. Butt, Keith Curtis, Yooyoung Lee, Jonathan Fiscus, Afzal Godil, Andrew Delgado, Jesse Zhang, Eliot Godard, Lukas Diduch, Alan F. Smeaton, Yvette Graham, Wessel Kraaij, Georges Quenot

Figure 1 for TRECVID 2019: An Evaluation Campaign to Benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & Retrieval
Figure 2 for TRECVID 2019: An Evaluation Campaign to Benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & Retrieval
Figure 3 for TRECVID 2019: An Evaluation Campaign to Benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & Retrieval
Figure 4 for TRECVID 2019: An Evaluation Campaign to Benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & Retrieval

The TREC Video Retrieval Evaluation (TRECVID) 2019 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in research and development of content-based exploitation and retrieval of information from digital video via open, metrics-based evaluation. Over the last nineteen years this effort has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID has been funded by NIST (National Institute of Standards and Technology) and other US government agencies. In addition, many organizations and individuals worldwide contribute significant time and effort. TRECVID 2019 represented a continuation of four tasks from TRECVID 2018. In total, 27 teams from various research organizations worldwide completed one or more of the following four tasks: 1. Ad-hoc Video Search (AVS) 2. Instance Search (INS) 3. Activities in Extended Video (ActEV) 4. Video to Text Description (VTT) This paper is an introduction to the evaluation framework, tasks, data, and measures used in the workshop.

* TRECVID Workshop overview paper. 39 pages 
Viaarxiv icon

Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings

Aug 15, 2020
Jesse Zhang, Brian Cheung, Chelsea Finn, Sergey Levine, Dinesh Jayaraman

Figure 1 for Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings
Figure 2 for Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings
Figure 3 for Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings
Figure 4 for Cautious Adaptation For Reinforcement Learning in Safety-Critical Settings

Reinforcement learning (RL) in real-world safety-critical target settings like urban driving is hazardous, imperiling the RL agent, other agents, and the environment. To overcome this difficulty, we propose a "safety-critical adaptation" task setting: an agent first trains in non-safety-critical "source" environments such as in a simulator, before it adapts to the target environment where failures carry heavy costs. We propose a solution approach, CARL, that builds on the intuition that prior experience in diverse environments equips an agent to estimate risk, which in turn enables relative safety through risk-averse, cautious adaptation. CARL first employs model-based RL to train a probabilistic model to capture uncertainty about transition dynamics and catastrophic states across varied source environments. Then, when exploring a new safety-critical environment with unknown dynamics, the CARL agent plans to avoid actions that could lead to catastrophic states. In experiments on car driving, cartpole balancing, half-cheetah locomotion, and robotic object manipulation, CARL successfully acquires cautious exploration behaviors, yielding higher rewards with fewer failures than strong RL adaptation baselines. Website at https://sites.google.com/berkeley.edu/carl.

* 15 pages, 8 figures, ICML 2020. Website with code: https://sites.google.com/berkeley.edu/carl 
Viaarxiv icon

Unsupervised Projection Networks for Generative Adversarial Networks

Oct 06, 2019
Daiyaan Arfeen, Jesse Zhang

Figure 1 for Unsupervised Projection Networks for Generative Adversarial Networks
Figure 2 for Unsupervised Projection Networks for Generative Adversarial Networks
Figure 3 for Unsupervised Projection Networks for Generative Adversarial Networks
Figure 4 for Unsupervised Projection Networks for Generative Adversarial Networks

We propose the use of unsupervised learning to train projection networks that project onto the latent space of an already trained generator. We apply our method to a trained StyleGAN, and use our projection network to perform image super-resolution and clustering of images into semantically identifiable groups.

* 6 Pages, 8 Figures, ICCV 2019 Workshop: Sensing, Understanding and Synthesizing Humans 
Viaarxiv icon