Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

John Salvatier

Active Reinforcement Learning: Observing Rewards at a Cost

Nov 24, 2020

David Krueger, Jan Leike, Owain Evans, John Salvatier

Figure 1 for Active Reinforcement Learning: Observing Rewards at a Cost

Figure 2 for Active Reinforcement Learning: Observing Rewards at a Cost

Figure 3 for Active Reinforcement Learning: Observing Rewards at a Cost

Figure 4 for Active Reinforcement Learning: Observing Rewards at a Cost

Abstract:Active reinforcement learning (ARL) is a variant on reinforcement learning where the agent does not observe the reward unless it chooses to pay a query cost c > 0. The central question of ARL is how to quantify the long-term value of reward information. Even in multi-armed bandits, computing the value of this information is intractable and we have to rely on heuristics. We propose and evaluate several heuristic approaches for ARL in multi-armed bandits and (tabular) Markov decision processes, and discuss and illustrate some challenging aspects of the ARL problem.

* Originally appeared at the NeurIPS 2016 "Future of Interactive Learning Machines (FILM)" workshop

Via

Access Paper or Ask Questions

When Will AI Exceed Human Performance? Evidence from AI Experts

May 03, 2018

Katja Grace, John Salvatier, Allan Dafoe, Baobao Zhang, Owain Evans

Figure 1 for When Will AI Exceed Human Performance? Evidence from AI Experts

Figure 2 for When Will AI Exceed Human Performance? Evidence from AI Experts

Figure 3 for When Will AI Exceed Human Performance? Evidence from AI Experts

Figure 4 for When Will AI Exceed Human Performance? Evidence from AI Experts

Abstract:Advances in artificial intelligence (AI) will transform modern life by reshaping transportation, health, science, finance, and the military. To adapt public policy, we need to better anticipate these advances. Here we report the results from a large survey of machine learning researchers on their beliefs about progress in AI. Researchers predict AI will outperform humans in many activities in the next ten years, such as translating languages (by 2024), writing high-school essays (by 2026), driving a truck (by 2027), working in retail (by 2031), writing a bestselling book (by 2049), and working as a surgeon (by 2053). Researchers believe there is a 50% chance of AI outperforming humans in all tasks in 45 years and of automating all human jobs in 120 years, with Asian respondents expecting these dates much sooner than North Americans. These results will inform discussion amongst researchers and policymakers about anticipating and managing trends in AI.

* Accepted by Journal of Artificial Intelligence Research (AI and Society Track). Minor update to refer to related work (page 5)

Via

Access Paper or Ask Questions

Agent-Agnostic Human-in-the-Loop Reinforcement Learning

Jan 15, 2017

David Abel, John Salvatier, Andreas Stuhlmüller, Owain Evans

Figure 1 for Agent-Agnostic Human-in-the-Loop Reinforcement Learning

Figure 2 for Agent-Agnostic Human-in-the-Loop Reinforcement Learning

Figure 3 for Agent-Agnostic Human-in-the-Loop Reinforcement Learning

Figure 4 for Agent-Agnostic Human-in-the-Loop Reinforcement Learning

Abstract:Providing Reinforcement Learning agents with expert advice can dramatically improve various aspects of learning. Prior work has developed teaching protocols that enable agents to learn efficiently in complex environments; many of these methods tailor the teacher's guidance to agents with a particular representation or underlying learning scheme, offering effective but specialized teaching procedures. In this work, we explore protocol programs, an agent-agnostic schema for Human-in-the-Loop Reinforcement Learning. Our goal is to incorporate the beneficial properties of a human teacher into Reinforcement Learning without making strong assumptions about the inner workings of the agent. We show how to represent existing approaches such as action pruning, reward shaping, and training in simulation as special cases of our schema and conduct preliminary experiments on simple domains.

* Presented at the NIPS Workshop on the Future of Interactive Learning Machines, 2016

Via

Access Paper or Ask Questions

Theano: A Python framework for fast computation of mathematical expressions

May 09, 2016

The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov(+103 more)

Figure 1 for Theano: A Python framework for fast computation of mathematical expressions

Figure 2 for Theano: A Python framework for fast computation of mathematical expressions

Figure 3 for Theano: A Python framework for fast computation of mathematical expressions

Figure 4 for Theano: A Python framework for fast computation of mathematical expressions

Abstract:Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.

* 19 pages, 5 figures

Via

Access Paper or Ask Questions