Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lars Buesing

Causally Correct Partial Models for Reinforcement Learning

Feb 07, 2020

Danilo J. Rezende, Ivo Danihelka, George Papamakarios, Nan Rosemary Ke, Ray Jiang, Theophane Weber, Karol Gregor, Hamza Merzic, Fabio Viola, Jane Wang(+4 more)

Figure 1 for Causally Correct Partial Models for Reinforcement Learning

Figure 2 for Causally Correct Partial Models for Reinforcement Learning

Figure 3 for Causally Correct Partial Models for Reinforcement Learning

Figure 4 for Causally Correct Partial Models for Reinforcement Learning

Abstract:In reinforcement learning, we can learn a model of future observations and rewards, and use it to plan the agent's next actions. However, jointly modeling future observations can be computationally expensive or even intractable if the observations are high-dimensional (e.g. images). For this reason, previous works have considered partial models, which model only part of the observation. In this paper, we show that partial models can be causally incorrect: they are confounded by the observations they don't model, and can therefore lead to incorrect planning. To address this, we introduce a general family of partial models that are provably causally correct, yet remain fast because they do not need to fully model future observations.

Via

Access Paper or Ask Questions

Combining Q-Learning and Search with Amortized Value Estimates

Jan 10, 2020

Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Tobias Pfaff, Theophane Weber, Lars Buesing, Peter W. Battaglia

Figure 1 for Combining Q-Learning and Search with Amortized Value Estimates

Figure 2 for Combining Q-Learning and Search with Amortized Value Estimates

Figure 3 for Combining Q-Learning and Search with Amortized Value Estimates

Figure 4 for Combining Q-Learning and Search with Amortized Value Estimates

Abstract:We introduce "Search with Amortized Value Estimates" (SAVE), an approach for combining model-free Q-learning with model-based Monte-Carlo Tree Search (MCTS). In SAVE, a learned prior over state-action values is used to guide MCTS, which estimates an improved set of state-action values. The new Q-estimates are then used in combination with real experience to update the prior. This effectively amortizes the value computation performed by MCTS, resulting in a cooperative relationship between model-free learning and model-based search. SAVE can be implemented on top of any Q-learning agent with access to a model, which we demonstrate by incorporating it into agents that perform challenging physical reasoning tasks and Atari. SAVE consistently achieves higher rewards with fewer training steps, and---in contrast to typical model-based search approaches---yields strong performance with very small search budgets. By combining real experience with information computed during search, SAVE demonstrates that it is possible to improve on both the performance of model-free learning and the computational cost of planning.

* Published as a conference paper at ICLR 2020

Via

Access Paper or Ask Questions

Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions

Oct 15, 2019

Lars Buesing, Nicolas Heess, Theophane Weber

Figure 1 for Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions

Figure 2 for Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions

Figure 3 for Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions

Figure 4 for Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions

Abstract:A plethora of problems in AI, engineering and the sciences are naturally formalized as inference in discrete probabilistic models. Exact inference is often prohibitively expensive, as it may require evaluating the (unnormalized) target density on its entire domain. Here we consider the setting where only a limited budget of calls to the unnormalized density oracle is available, raising the challenge of where in the domain to allocate these function calls in order to construct a good approximate solution. We formulate this problem as an instance of sequential decision-making under uncertainty and leverage methods from reinforcement learning for probabilistic inference with budget constraints. In particular, we propose the TreeSample algorithm, an adaptation of Monte Carlo Tree Search to approximate inference. This algorithm caches all previous queries to the density oracle in an explicit search tree, and dynamically allocates new queries based on a "best-first" heuristic for exploration, using existing upper confidence bound methods. Our non-parametric inference method can be effectively combined with neural networks that compile approximate conditionals of the target, which are then used to guide the inference search and enable generalization across multiple target distributions. We show empirically that TreeSample outperforms standard approximate inference methods on synthetic factor graphs.

Via

Access Paper or Ask Questions

Credit Assignment Techniques in Stochastic Computation Graphs

Jan 07, 2019

Théophane Weber, Nicolas Heess, Lars Buesing, David Silver

Figure 1 for Credit Assignment Techniques in Stochastic Computation Graphs

Figure 2 for Credit Assignment Techniques in Stochastic Computation Graphs

Figure 3 for Credit Assignment Techniques in Stochastic Computation Graphs

Figure 4 for Credit Assignment Techniques in Stochastic Computation Graphs

Abstract:Stochastic computation graphs (SCGs) provide a formalism to represent structured optimization problems arising in artificial intelligence, including supervised, unsupervised, and reinforcement learning. Previous work has shown that an unbiased estimator of the gradient of the expected loss of SCGs can be derived from a single principle. However, this estimator often has high variance and requires a full model evaluation per data point, making this algorithm costly in large graphs. In this work, we address these problems by generalizing concepts from the reinforcement learning literature. We introduce the concepts of value functions, baselines and critics for arbitrary SCGs, and show how to use them to derive lower-variance gradient estimates from partial model evaluations, paving the way towards general and efficient credit assignment for gradient-based optimization. In doing so, we demonstrate how our results unify recent advances in the probabilistic inference and reinforcement learning literature.

Via

Access Paper or Ask Questions

Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search

Nov 15, 2018

Lars Buesing, Theophane Weber, Yori Zwols, Sebastien Racaniere, Arthur Guez, Jean-Baptiste Lespiau, Nicolas Heess

Figure 1 for Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search

Figure 2 for Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search

Figure 3 for Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search

Figure 4 for Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search

Abstract:Learning policies on data synthesized by models can in principle quench the thirst of reinforcement learning algorithms for large amounts of real experience, which is often costly to acquire. However, simulating plausible experience de novo is a hard problem for many complex environments, often resulting in biases for model-based policy evaluation and search. Instead of de novo synthesis of data, here we assume logged, real experience and model alternative outcomes of this experience under counterfactual actions, actions that were not actually taken. Based on this, we propose the Counterfactually-Guided Policy Search (CF-GPS) algorithm for learning policies in POMDPs from off-policy experience. It leverages structural causal models for counterfactual evaluation of arbitrary policies on individual off-policy episodes. CF-GPS can improve on vanilla model-based RL algorithms by making use of available logged data to de-bias model predictions. In contrast to off-policy algorithms based on Importance Sampling which re-weight data, CF-GPS leverages a model to explicitly consider alternative outcomes, allowing the algorithm to make better use of experience data. We find empirically that these advantages translate into improved policy evaluation and search results on a non-trivial grid-world task. Finally, we show that CF-GPS generalizes the previously proposed Guided Policy Search and that reparameterization-based algorithms such Stochastic Value Gradient can be interpreted as counterfactual methods.

Via

Access Paper or Ask Questions

Imagination-Augmented Agents for Deep Reinforcement Learning

Feb 14, 2018

Théophane Weber, Sébastien Racanière, David P. Reichert, Lars Buesing, Arthur Guez, Danilo Jimenez Rezende, Adria Puigdomènech Badia, Oriol Vinyals, Nicolas Heess, Yujia Li(+5 more)

Figure 1 for Imagination-Augmented Agents for Deep Reinforcement Learning

Figure 2 for Imagination-Augmented Agents for Deep Reinforcement Learning

Figure 3 for Imagination-Augmented Agents for Deep Reinforcement Learning

Figure 4 for Imagination-Augmented Agents for Deep Reinforcement Learning

Abstract:We introduce Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning combining model-free and model-based aspects. In contrast to most existing model-based reinforcement learning and planning methods, which prescribe how a model should be used to arrive at a policy, I2As learn to interpret predictions from a learned environment model to construct implicit plans in arbitrary ways, by using the predictions as additional context in deep policy networks. I2As show improved data efficiency, performance, and robustness to model misspecification compared to several baselines.

Via

Access Paper or Ask Questions

Learning and Querying Fast Generative Models for Reinforcement Learning

Feb 08, 2018

Lars Buesing, Theophane Weber, Sebastien Racaniere, S. M. Ali Eslami, Danilo Rezende, David P. Reichert, Fabio Viola, Frederic Besse, Karol Gregor, Demis Hassabis(+1 more)

Figure 1 for Learning and Querying Fast Generative Models for Reinforcement Learning

Figure 2 for Learning and Querying Fast Generative Models for Reinforcement Learning

Figure 3 for Learning and Querying Fast Generative Models for Reinforcement Learning

Figure 4 for Learning and Querying Fast Generative Models for Reinforcement Learning

Abstract:A key challenge in model-based reinforcement learning (RL) is to synthesize computationally efficient and accurate environment models. We show that carefully designed generative models that learn and operate on compact state representations, so-called state-space models, substantially reduce the computational costs for predicting outcomes of sequences of actions. Extensive experiments establish that state-space models accurately capture the dynamics of Atari games from the Arcade Learning Environment from raw pixels. The computational speed-up of state-space models while maintaining high accuracy makes their application in RL feasible: We demonstrate that agents which query these models for decision making outperform strong model-free baselines on the game MSPACMAN, demonstrating the potential of using learned environment models for planning.

Via

Access Paper or Ask Questions

Fast amortized inference of neural activity from calcium imaging data with variational autoencoders

Nov 06, 2017

Artur Speiser, Jinyao Yan, Evan Archer, Lars Buesing, Srinivas C. Turaga, Jakob H. Macke

Figure 1 for Fast amortized inference of neural activity from calcium imaging data with variational autoencoders

Figure 2 for Fast amortized inference of neural activity from calcium imaging data with variational autoencoders

Figure 3 for Fast amortized inference of neural activity from calcium imaging data with variational autoencoders

Figure 4 for Fast amortized inference of neural activity from calcium imaging data with variational autoencoders

Abstract:Calcium imaging permits optical measurement of neural activity. Since intracellular calcium concentration is an indirect measurement of neural activity, computational tools are necessary to infer the true underlying spiking activity from fluorescence measurements. Bayesian model inversion can be used to solve this problem, but typically requires either computationally expensive MCMC sampling, or faster but approximate maximum-a-posteriori optimization. Here, we introduce a flexible algorithmic framework for fast, efficient and accurate extraction of neural spikes from imaging data. Using the framework of variational autoencoders, we propose to amortize inference by training a deep neural network to perform model inversion efficiently. The recognition network is trained to produce samples from the posterior distribution over spike trains. Once trained, performing inference amounts to a fast single forward pass through the network, without the need for iterative optimization or sampling. We show that amortization can be applied flexibly to a wide range of nonlinear generative models and significantly improves upon the state of the art in computation time, while achieving competitive accuracy. Our framework is also able to represent posterior distributions over spike-trains. We demonstrate the generality of our method by proposing the first probabilistic approach for separating backpropagating action potentials from putative synaptic inputs in calcium imaging of dendritic spines.

* NIPS 2017

Via

Access Paper or Ask Questions

Learning model-based planning from scratch

Jul 19, 2017

Razvan Pascanu, Yujia Li, Oriol Vinyals, Nicolas Heess, Lars Buesing, Sebastien Racanière, David Reichert, Théophane Weber, Daan Wierstra, Peter Battaglia

Figure 1 for Learning model-based planning from scratch

Figure 2 for Learning model-based planning from scratch

Figure 3 for Learning model-based planning from scratch

Figure 4 for Learning model-based planning from scratch

Abstract:Conventional wisdom holds that model-based planning is a powerful approach to sequential decision-making. It is often very challenging in practice, however, because while a model can be used to evaluate a plan, it does not prescribe how to construct a plan. Here we introduce the "Imagination-based Planner", the first model-based, sequential decision-making agent that can learn to construct, evaluate, and execute plans. Before any action, it can perform a variable number of imagination steps, which involve proposing an imagined action and evaluating it with its model-based imagination. All imagined actions and outcomes are aggregated, iteratively, into a "plan context" which conditions future real and imagined actions. The agent can even decide how to imagine: testing out alternative imagined actions, chaining sequences of actions together, or building a more complex "imagination tree" by navigating flexibly among the previously imagined states using a learned policy. And our agent can learn to plan economically, jointly optimizing for external rewards and computational costs associated with using its imagination. We show that our architecture can learn to solve a challenging continuous control problem, and also learn elaborate planning strategies in a discrete maze-solving task. Our work opens a new direction toward learning the components of a model-based planning system and how to use them.

Via

Access Paper or Ask Questions

Bayesian Manifold Learning: The Locally Linear Latent Variable Model (LL-LVM)

Dec 01, 2015

Mijung Park, Wittawat Jitkrittum, Ahmad Qamar, Zoltan Szabo, Lars Buesing, Maneesh Sahani

Figure 1 for Bayesian Manifold Learning: The Locally Linear Latent Variable Model (LL-LVM)

Figure 2 for Bayesian Manifold Learning: The Locally Linear Latent Variable Model (LL-LVM)

Figure 3 for Bayesian Manifold Learning: The Locally Linear Latent Variable Model (LL-LVM)

Figure 4 for Bayesian Manifold Learning: The Locally Linear Latent Variable Model (LL-LVM)

Abstract:We introduce the Locally Linear Latent Variable Model (LL-LVM), a probabilistic model for non-linear manifold discovery that describes a joint distribution over observations, their manifold coordinates and locally linear maps conditioned on a set of neighbourhood relationships. The model allows straightforward variational optimisation of the posterior distribution on coordinates and locally linear maps from the latent space to the observation space given the data. Thus, the LL-LVM encapsulates the local-geometry preserving intuitions that underlie non-probabilistic methods such as locally linear embedding (LLE). Its probabilistic semantics make it easy to evaluate the quality of hypothesised neighbourhood relationships, select the intrinsic dimensionality of the manifold, construct out-of-sample extensions and to combine the manifold model with additional probabilistic models that capture the structure of coordinates within the manifold.

* accepted to NIPS 2015

Via

Access Paper or Ask Questions