Alert button
Picture for John Salvatier

John Salvatier

Alert button

Active Reinforcement Learning: Observing Rewards at a Cost

Nov 24, 2020
David Krueger, Jan Leike, Owain Evans, John Salvatier

Figure 1 for Active Reinforcement Learning: Observing Rewards at a Cost
Figure 2 for Active Reinforcement Learning: Observing Rewards at a Cost
Figure 3 for Active Reinforcement Learning: Observing Rewards at a Cost
Figure 4 for Active Reinforcement Learning: Observing Rewards at a Cost

Active reinforcement learning (ARL) is a variant on reinforcement learning where the agent does not observe the reward unless it chooses to pay a query cost c > 0. The central question of ARL is how to quantify the long-term value of reward information. Even in multi-armed bandits, computing the value of this information is intractable and we have to rely on heuristics. We propose and evaluate several heuristic approaches for ARL in multi-armed bandits and (tabular) Markov decision processes, and discuss and illustrate some challenging aspects of the ARL problem.

* Originally appeared at the NeurIPS 2016 "Future of Interactive Learning Machines (FILM)" workshop 
Viaarxiv icon

When Will AI Exceed Human Performance? Evidence from AI Experts

May 03, 2018
Katja Grace, John Salvatier, Allan Dafoe, Baobao Zhang, Owain Evans

Figure 1 for When Will AI Exceed Human Performance? Evidence from AI Experts
Figure 2 for When Will AI Exceed Human Performance? Evidence from AI Experts
Figure 3 for When Will AI Exceed Human Performance? Evidence from AI Experts
Figure 4 for When Will AI Exceed Human Performance? Evidence from AI Experts

Advances in artificial intelligence (AI) will transform modern life by reshaping transportation, health, science, finance, and the military. To adapt public policy, we need to better anticipate these advances. Here we report the results from a large survey of machine learning researchers on their beliefs about progress in AI. Researchers predict AI will outperform humans in many activities in the next ten years, such as translating languages (by 2024), writing high-school essays (by 2026), driving a truck (by 2027), working in retail (by 2031), writing a bestselling book (by 2049), and working as a surgeon (by 2053). Researchers believe there is a 50% chance of AI outperforming humans in all tasks in 45 years and of automating all human jobs in 120 years, with Asian respondents expecting these dates much sooner than North Americans. These results will inform discussion amongst researchers and policymakers about anticipating and managing trends in AI.

* Accepted by Journal of Artificial Intelligence Research (AI and Society Track). Minor update to refer to related work (page 5) 
Viaarxiv icon

Agent-Agnostic Human-in-the-Loop Reinforcement Learning

Jan 15, 2017
David Abel, John Salvatier, Andreas Stuhlmüller, Owain Evans

Figure 1 for Agent-Agnostic Human-in-the-Loop Reinforcement Learning
Figure 2 for Agent-Agnostic Human-in-the-Loop Reinforcement Learning
Figure 3 for Agent-Agnostic Human-in-the-Loop Reinforcement Learning
Figure 4 for Agent-Agnostic Human-in-the-Loop Reinforcement Learning

Providing Reinforcement Learning agents with expert advice can dramatically improve various aspects of learning. Prior work has developed teaching protocols that enable agents to learn efficiently in complex environments; many of these methods tailor the teacher's guidance to agents with a particular representation or underlying learning scheme, offering effective but specialized teaching procedures. In this work, we explore protocol programs, an agent-agnostic schema for Human-in-the-Loop Reinforcement Learning. Our goal is to incorporate the beneficial properties of a human teacher into Reinforcement Learning without making strong assumptions about the inner workings of the agent. We show how to represent existing approaches such as action pruning, reward shaping, and training in simulation as special cases of our schema and conduct preliminary experiments on simple domains.

* Presented at the NIPS Workshop on the Future of Interactive Learning Machines, 2016 
Viaarxiv icon

Theano: A Python framework for fast computation of mathematical expressions

May 09, 2016
The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano, Tim Cooijmans, Marc-Alexandre Côté, Myriam Côté, Aaron Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Mélanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian Goodfellow, Matt Graham, Caglar Gulcehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balázs Hidasi, Sina Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César Laurent, Sean Lee, Simon Lefrancois, Simon Lemieux, Nicholas Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert T. McGibbon, Roland Memisevic, Bart van Merriënboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, François Savard, Jan Schlüter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, Étienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay, Gijs van Tulder, Joseph Turian, Sebastian Urban, Pascal Vincent, Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J. Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng Zhang, Ying Zhang

Figure 1 for Theano: A Python framework for fast computation of mathematical expressions
Figure 2 for Theano: A Python framework for fast computation of mathematical expressions
Figure 3 for Theano: A Python framework for fast computation of mathematical expressions
Figure 4 for Theano: A Python framework for fast computation of mathematical expressions

Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.

* 19 pages, 5 figures 
Viaarxiv icon