Alert button
Picture for Özgür Şimşek

Özgür Şimşek

Alert button

Creating Multi-Level Skill Hierarchies in Reinforcement Learning

Jun 16, 2023
Joshua B. Evans, Özgür Şimşek

Figure 1 for Creating Multi-Level Skill Hierarchies in Reinforcement Learning
Figure 2 for Creating Multi-Level Skill Hierarchies in Reinforcement Learning
Figure 3 for Creating Multi-Level Skill Hierarchies in Reinforcement Learning
Figure 4 for Creating Multi-Level Skill Hierarchies in Reinforcement Learning

What is a useful skill hierarchy for an autonomous agent? We propose an answer based on the graphical structure of an agent's interaction with its environment. Our approach uses hierarchical graph partitioning to expose the structure of the graph at varying timescales, producing a skill hierarchy with multiple levels of abstraction. At each level of the hierarchy, skills move the agent between regions of the state space that are well connected within themselves but weakly connected to each other. We illustrate the utility of the proposed skill hierarchy in a wide variety of domains in the context of reinforcement learning.

* 19 pages, 12 figures 
Viaarxiv icon

Explaining Reinforcement Learning with Shapley Values

Jun 09, 2023
Daniel Beechey, Thomas M. S. Smith, Özgür Şimşek

For reinforcement learning systems to be widely adopted, their users must understand and trust them. We present a theoretical analysis of explaining reinforcement learning using Shapley values, following a principled approach from game theory for identifying the contribution of individual players to the outcome of a cooperative game. We call this general framework Shapley Values for Explaining Reinforcement Learning (SVERL). Our analysis exposes the limitations of earlier uses of Shapley values in reinforcement learning. We then develop an approach that uses Shapley values to explain agent performance. In a variety of domains, SVERL produces meaningful explanations that match and supplement human intuition.

* 12 pages, 9 figures. Accepted at ICML 2023 
Viaarxiv icon

Resource-Constrained Station-Keeping for Helium Balloons using Reinforcement Learning

Mar 02, 2023
Jack Saunders, Loïc Prenevost, Özgür Şimşek, Alan Hunter, Wenbin Li

Figure 1 for Resource-Constrained Station-Keeping for Helium Balloons using Reinforcement Learning
Figure 2 for Resource-Constrained Station-Keeping for Helium Balloons using Reinforcement Learning
Figure 3 for Resource-Constrained Station-Keeping for Helium Balloons using Reinforcement Learning
Figure 4 for Resource-Constrained Station-Keeping for Helium Balloons using Reinforcement Learning

High altitude balloons have proved useful for ecological aerial surveys, atmospheric monitoring, and communication relays. However, due to weight and power constraints, there is a need to investigate alternate modes of propulsion to navigate in the stratosphere. Very recently, reinforcement learning has been proposed as a control scheme to maintain the balloon in the region of a fixed location, facilitated through diverse opposing wind-fields at different altitudes. Although air-pump based station keeping has been explored, there is no research on the control problem for venting and ballasting actuated balloons, which is commonly used as a low-cost alternative. We show how reinforcement learning can be used for this type of balloon. Specifically, we use the soft actor-critic algorithm, which on average is able to station-keep within 50\;km for 25\% of the flight, consistent with state-of-the-art. Furthermore, we show that the proposed controller effectively minimises the consumption of resources, thereby supporting long duration flights. We frame the controller as a continuous control reinforcement learning problem, which allows for a more diverse range of trajectories, as opposed to current state-of-the-art work, which uses discrete action spaces. Furthermore, through continuous control, we can make use of larger ascent rates which are not possible using air-pumps. The desired ascent-rate is decoupled into desired altitude and time-factor to provide a more transparent policy, compared to low-level control commands used in previous works. Finally, by applying the equations of motion, we establish appropriate thresholds for venting and ballasting to prevent the agent from exploiting the environment. More specifically, we ensure actions are physically feasible by enforcing constraints on venting and ballasting.

Viaarxiv icon

Iterative Policy-Space Expansion in Reinforcement Learning

Dec 05, 2019
Jan Malte Lichtenberg, Özgür Şimşek

Figure 1 for Iterative Policy-Space Expansion in Reinforcement Learning
Figure 2 for Iterative Policy-Space Expansion in Reinforcement Learning

Humans and animals solve a difficult problem much more easily when they are presented with a sequence of problems that starts simple and slowly increases in difficulty. We explore this idea in the context of reinforcement learning. Rather than providing the agent with an externally provided curriculum of progressively more difficult tasks, the agent solves a single task utilizing a decreasingly constrained policy space. The algorithm we propose first learns to categorize features into positive and negative before gradually learning a more refined policy. Experimental results in Tetris demonstrate superior learning rate of our approach when compared to existing algorithms.

* Workshop on Biological and Artificial Reinforcement Learning at the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada 
Viaarxiv icon

The Game of Tetris in Machine Learning

May 10, 2019
Simón Algorta, Özgür Şimşek

Figure 1 for The Game of Tetris in Machine Learning
Figure 2 for The Game of Tetris in Machine Learning

The game of Tetris is an important benchmark for research in artificial intelligence and machine learning. This paper provides a historical account of the algorithmic developments in Tetris and discusses open challenges. Handcrafted controllers, genetic algorithms, and reinforcement learning have all contributed to good solutions. However, existing solutions fall far short of what can be achieved by expert players playing without time pressure. Further study of the game has the potential to contribute to important areas of research, including feature discovery, autonomous learning of action hierarchies, and sample-efficient reinforcement learning.

Viaarxiv icon