Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stephanie Milani

Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers

Apr 28, 2022

Micah Carroll, Jessy Lin, Orr Paradise, Raluca Georgescu, Mingfei Sun, David Bignell, Stephanie Milani, Katja Hofmann, Matthew Hausknecht, Anca Dragan(+1 more)

Figure 1 for Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers

Figure 2 for Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers

Figure 3 for Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers

Figure 4 for Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers

Abstract:Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline RL, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns. We introduce the FlexiBiT framework, which provides a unified way to specify models which can be trained on many different sequential decision making tasks. We show that a single FlexiBiT model is simultaneously capable of carrying out many tasks with performance similar to or better than specialized models. Additionally, we show that performance can be further improved by fine-tuning our general model on specific tasks of interest.

Via

Access Paper or Ask Questions

Retrospective on the 2021 BASALT Competition on Learning from Human Feedback

Apr 14, 2022

Rohin Shah, Steven H. Wang, Cody Wild, Stephanie Milani, Anssi Kanervisto, Vinicius G. Goecks, Nicholas Waytowich, David Watkins-Valls, Bharat Prakash, Edmund Mills(+6 more)

Figure 1 for Retrospective on the 2021 BASALT Competition on Learning from Human Feedback

Figure 2 for Retrospective on the 2021 BASALT Competition on Learning from Human Feedback

Figure 3 for Retrospective on the 2021 BASALT Competition on Learning from Human Feedback

Figure 4 for Retrospective on the 2021 BASALT Competition on Learning from Human Feedback

Abstract:We held the first-ever MineRL Benchmark for Agents that Solve Almost-Lifelike Tasks (MineRL BASALT) Competition at the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021). The goal of the competition was to promote research towards agents that use learning from human feedback (LfHF) techniques to solve open-world tasks. Rather than mandating the use of LfHF techniques, we described four tasks in natural language to be accomplished in the video game Minecraft, and allowed participants to use any approach they wanted to build agents that could accomplish the tasks. Teams developed a diverse range of LfHF algorithms across a variety of possible human feedback types. The three winning teams implemented significantly different approaches while achieving similar performance. Interestingly, their approaches performed well on different tasks, validating our choice of tasks to include in the competition. While the outcomes validated the design of our competition, we did not get as many participants and submissions as our sister competition, MineRL Diamond. We speculate about the causes of this problem and suggest improvements for future iterations of the competition.

* Accepted to the PMLR NeurIPS 2021 Demo & Competition Track volume

Via

Access Paper or Ask Questions

MineRL Diamond 2021 Competition: Overview, Results, and Lessons Learned

Feb 17, 2022

Anssi Kanervisto, Stephanie Milani, Karolis Ramanauskas, Nicholay Topin, Zichuan Lin, Junyou Li, Jianing Shi, Deheng Ye, Qiang Fu, Wei Yang(+12 more)

Figure 1 for MineRL Diamond 2021 Competition: Overview, Results, and Lessons Learned

Figure 2 for MineRL Diamond 2021 Competition: Overview, Results, and Lessons Learned

Figure 3 for MineRL Diamond 2021 Competition: Overview, Results, and Lessons Learned

Figure 4 for MineRL Diamond 2021 Competition: Overview, Results, and Lessons Learned

Abstract:Reinforcement learning competitions advance the field by providing appropriate scope and support to develop solutions toward a specific problem. To promote the development of more broadly applicable methods, organizers need to enforce the use of general techniques, the use of sample-efficient methods, and the reproducibility of the results. While beneficial for the research community, these restrictions come at a cost -- increased difficulty. If the barrier for entry is too high, many potential participants are demoralized. With this in mind, we hosted the third edition of the MineRL ObtainDiamond competition, MineRL Diamond 2021, with a separate track in which we permitted any solution to promote the participation of newcomers. With this track and more extensive tutorials and support, we saw an increased number of submissions. The participants of this easier track were able to obtain a diamond, and the participants of the harder track progressed the generalizable solutions in the same task.

* Under review for PMLR volume on NeurIPS 2021 competitions

Via

Access Paper or Ask Questions

A Survey of Explainable Reinforcement Learning

Feb 17, 2022

Stephanie Milani, Nicholay Topin, Manuela Veloso, Fei Fang

Figure 1 for A Survey of Explainable Reinforcement Learning

Figure 2 for A Survey of Explainable Reinforcement Learning

Figure 3 for A Survey of Explainable Reinforcement Learning

Figure 4 for A Survey of Explainable Reinforcement Learning

Abstract:Explainable reinforcement learning (XRL) is an emerging subfield of explainable machine learning that has attracted considerable attention in recent years. The goal of XRL is to elucidate the decision-making process of learning agents in sequential decision-making settings. In this survey, we propose a novel taxonomy for organizing the XRL literature that prioritizes the RL setting. We overview techniques according to this taxonomy. We point out gaps in the literature, which we use to motivate and outline a roadmap for future work.

Via

Access Paper or Ask Questions

The MineRL BASALT Competition on Learning from Human Feedback

Jul 05, 2021

Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin(+3 more)

Figure 1 for The MineRL BASALT Competition on Learning from Human Feedback

Figure 2 for The MineRL BASALT Competition on Learning from Human Feedback

Abstract:The last decade has seen a significant increase of interest in deep learning research, with many public successes that have demonstrated its potential. As such, these systems are now being incorporated into commercial products. With this comes an additional challenge: how can we build AI systems that solve tasks where there is not a crisp, well-defined specification? While multiple solutions have been proposed, in this competition we focus on one in particular: learning from human feedback. Rather than training AI systems using a predefined reward function or using a labeled dataset with a predefined set of categories, we instead train the AI system using a learning signal derived from some form of human feedback, which can evolve over time as the understanding of the task changes, or as the capabilities of the AI system improve. The MineRL BASALT competition aims to spur forward research on this important class of techniques. We design a suite of four tasks in Minecraft for which we expect it will be hard to write down hardcoded reward functions. These tasks are defined by a paragraph of natural language: for example, "create a waterfall and take a scenic picture of it", with additional clarifying details. Participants must train a separate agent for each task, using any method they want. Agents are then evaluated by humans who have read the task description. To help participants get started, we provide a dataset of human demonstrations on each of the four tasks, as well as an imitation learning baseline that leverages these demonstrations. Our hope is that this competition will improve our ability to build AI systems that do what their designers intend them to do, even when the intent cannot be easily formalized. Besides allowing AI to solve more tasks, this can also enable more effective regulation of AI systems, as well as making progress on the value alignment problem.

* NeurIPS 2021 Competition Track

Via

Access Paper or Ask Questions

Towards robust and domain agnostic reinforcement learning competitions

Jun 07, 2021

William Hebgen Guss, Stephanie Milani, Nicholay Topin, Brandon Houghton, Sharada Mohanty, Andrew Melnik, Augustin Harter, Benoit Buschmaas, Bjarne Jaster, Christoph Berganski(+19 more)

Figure 1 for Towards robust and domain agnostic reinforcement learning competitions

Figure 2 for Towards robust and domain agnostic reinforcement learning competitions

Figure 3 for Towards robust and domain agnostic reinforcement learning competitions

Figure 4 for Towards robust and domain agnostic reinforcement learning competitions

Abstract:Reinforcement learning competitions have formed the basis for standard research benchmarks, galvanized advances in the state-of-the-art, and shaped the direction of the field. Despite this, a majority of challenges suffer from the same fundamental problems: participant solutions to the posed challenge are usually domain-specific, biased to maximally exploit compute resources, and not guaranteed to be reproducible. In this paper, we present a new framework of competition design that promotes the development of algorithms that overcome these barriers. We propose four central mechanisms for achieving this end: submission retraining, domain randomization, desemantization through domain obfuscation, and the limitation of competition compute and environment-sample budget. To demonstrate the efficacy of this design, we proposed, organized, and ran the MineRL 2020 Competition on Sample-Efficient Reinforcement Learning. In this work, we describe the organizational outcomes of the competition and show that the resulting participant submissions are reproducible, non-specific to the competition environment, and sample/resource efficient, despite the difficult competition task.

* 20 pages, several figures, published PMLR

Via

Access Paper or Ask Questions

Iterative Bounding MDPs: Learning Interpretable Policies via Non-Interpretable Methods

Feb 25, 2021

Nicholay Topin, Stephanie Milani, Fei Fang, Manuela Veloso

Figure 1 for Iterative Bounding MDPs: Learning Interpretable Policies via Non-Interpretable Methods

Figure 2 for Iterative Bounding MDPs: Learning Interpretable Policies via Non-Interpretable Methods

Figure 3 for Iterative Bounding MDPs: Learning Interpretable Policies via Non-Interpretable Methods

Figure 4 for Iterative Bounding MDPs: Learning Interpretable Policies via Non-Interpretable Methods

Abstract:Current work in explainable reinforcement learning generally produces policies in the form of a decision tree over the state space. Such policies can be used for formal safety verification, agent behavior prediction, and manual inspection of important features. However, existing approaches fit a decision tree after training or use a custom learning procedure which is not compatible with new learning techniques, such as those which use neural networks. To address this limitation, we propose a novel Markov Decision Process (MDP) type for learning decision tree policies: Iterative Bounding MDPs (IBMDPs). An IBMDP is constructed around a base MDP so each IBMDP policy is guaranteed to correspond to a decision tree policy for the base MDP when using a method-agnostic masking procedure. Because of this decision tree equivalence, any function approximator can be used during training, including a neural network, while yielding a decision tree policy for the base MDP. We present the required masking procedure as well as a modified value update step which allows IBMDPs to be solved using existing algorithms. We apply this procedure to produce IBMDP variants of recent reinforcement learning methods. We empirically show the benefits of our approach by solving IBMDPs to produce decision tree policies for the base MDPs.

Via

Access Paper or Ask Questions

The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors

Jan 26, 2021

William H. Guss, Mario Ynocente Castro, Sam Devlin, Brandon Houghton, Noboru Sean Kuno, Crissman Loomis, Stephanie Milani, Sharada Mohanty, Keisuke Nakata, Ruslan Salakhutdinov(+5 more)

Figure 1 for The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors

Figure 2 for The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors

Figure 3 for The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors

Figure 4 for The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors

Abstract:Although deep reinforcement learning has led to breakthroughs in many difficult domains, these successes have required an ever-increasing number of samples, affording only a shrinking segment of the AI community access to their development. Resolution of these limitations requires new, sample-efficient methods. To facilitate research in this direction, we propose this second iteration of the MineRL Competition. The primary goal of the competition is to foster the development of algorithms which can efficiently leverage human demonstrations to drastically reduce the number of samples needed to solve complex, hierarchical, and sparse environments. To that end, participants compete under a limited environment sample-complexity budget to develop systems which solve the MineRL ObtainDiamond task in Minecraft, a sequential decision making environment requiring long-term planning, hierarchical control, and efficient exploration methods. The competition is structured into two rounds in which competitors are provided several paired versions of the dataset and environment with different game textures and shaders. At the end of each round, competitors submit containerized versions of their learning algorithms to the AIcrowd platform where they are trained from scratch on a hold-out dataset-environment pair for a total of 4-days on a pre-specified hardware platform. In this follow-up iteration to the NeurIPS 2019 MineRL Competition, we implement new features to expand the scale and reach of the competition. In response to the feedback of the previous participants, we introduce a second minor track focusing on solutions without access to environment interactions of any kind except during test-time. Further we aim to prompt domain agnostic submissions by implementing several novel competition mechanics including action-space randomization and desemantization of observations and actions.

* 37 pages, initial submission, accepted at NeurIPS. arXiv admin note: substantial text overlap with arXiv:1904.10079

Via

Access Paper or Ask Questions

Guaranteeing Reproducibility in Deep Learning Competitions

May 12, 2020

Brandon Houghton, Stephanie Milani, Nicholay Topin, William Guss, Katja Hofmann, Diego Perez-Liebana, Manuela Veloso, Ruslan Salakhutdinov

Abstract:To encourage the development of methods with reproducible and robust training behavior, we propose a challenge paradigm where competitors are evaluated directly on the performance of their learning procedures rather than pre-trained agents. Since competition organizers re-train proposed methods in a controlled setting they can guarantee reproducibility, and -- by retraining submissions using a held-out test set -- help ensure generalization past the environments on which they were trained.

* Accepted as a poster presentation to the 2019 NeruIPS Challenges in Machine Learning workshop (CiML)

Via

Access Paper or Ask Questions

Retrospective Analysis of the 2019 MineRL Competition on Sample Efficient Reinforcement Learning

Mar 27, 2020

Stephanie Milani, Nicholay Topin, Brandon Houghton, William H. Guss, Sharada P. Mohanty, Keisuke Nakata, Oriol Vinyals, Noboru Sean Kuno

Figure 1 for Retrospective Analysis of the 2019 MineRL Competition on Sample Efficient Reinforcement Learning

Figure 2 for Retrospective Analysis of the 2019 MineRL Competition on Sample Efficient Reinforcement Learning

Figure 3 for Retrospective Analysis of the 2019 MineRL Competition on Sample Efficient Reinforcement Learning

Figure 4 for Retrospective Analysis of the 2019 MineRL Competition on Sample Efficient Reinforcement Learning

Abstract:To facilitate research in the direction of sample-efficient reinforcement learning, we held the MineRL Competition on Sample-Efficient Reinforcement Learning Using Human Priors at the Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2019). The primary goal of this competition was to promote the development of algorithms that use human demonstrations alongside reinforcement learning to reduce the number of samples needed to solve complex, hierarchical, and sparse environments. We describe the competition and provide an overview of the top solutions, each of which uses deep reinforcement learning and/or imitation learning. We also discuss the impact of our organizational decisions on the competition as well as future directions for improvement.

* 10 pages, 2 figures

Via

Access Paper or Ask Questions