Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stefano V. Albrecht

Explainable AI for Safe and Trustworthy Autonomous Driving: A Systematic Review

Feb 08, 2024

Anton Kuznietsov, Balint Gyevnar, Cheng Wang, Steven Peters, Stefano V. Albrecht

Figure 1 for Explainable AI for Safe and Trustworthy Autonomous Driving: A Systematic Review

Figure 2 for Explainable AI for Safe and Trustworthy Autonomous Driving: A Systematic Review

Figure 3 for Explainable AI for Safe and Trustworthy Autonomous Driving: A Systematic Review

Figure 4 for Explainable AI for Safe and Trustworthy Autonomous Driving: A Systematic Review

Abstract:Artificial Intelligence (AI) shows promising applications for the perception and planning tasks in autonomous driving (AD) due to its superior performance compared to conventional methods. However, inscrutable AI systems exacerbate the existing challenge of safety assurance of AD. One way to mitigate this challenge is to utilize explainable AI (XAI) techniques. To this end, we present the first comprehensive systematic literature review of explainable methods for safe and trustworthy AD. We begin by analyzing the requirements for AI in the context of AD, focusing on three key aspects: data, model, and agency. We find that XAI is fundamental to meeting these requirements. Based on this, we explain the sources of explanations in AI and describe a taxonomy of XAI. We then identify five key contributions of XAI for safe and trustworthy AI in AD, which are interpretable design, interpretable surrogate models, interpretable monitoring, auxiliary explanations, and interpretable validation. Finally, we propose a modular framework called SafeX to integrate these contributions, enabling explanation delivery to users while simultaneously ensuring the safety of AI models.

Via

Access Paper or Ask Questions

ICED: Zero-Shot Transfer in Reinforcement Learning via In-Context Environment Design

Feb 05, 2024

Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas, Stefano V. Albrecht

Abstract:Autonomous agents trained using deep reinforcement learning (RL) often lack the ability to successfully generalise to new environments, even when they share characteristics with the environments they have encountered during training. In this work, we investigate how the sampling of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents. We discover that, for deep actor-critic architectures sharing their base layers, prioritising levels according to their value loss minimises the mutual information between the agent's internal representation and the set of training levels in the generated training data. This provides a novel theoretical justification for the implicit regularisation achieved by certain adaptive sampling strategies. We then turn our attention to unsupervised environment design (UED) methods, which have more control over the data generation mechanism. We find that existing UED methods can significantly shift the training distribution, which translates to low ZSG performance. To prevent both overfitting and distributional shift, we introduce in-context environment design (ICED). ICED generates levels using a variational autoencoder trained over an initial set of level parameters, reducing distributional shift, and achieves significant improvements in ZSG over adaptive level sampling strategies and UED methods.

* arXiv admin note: substantial text overlap with arXiv:2310.03494

Via

Access Paper or Ask Questions

Sample Relationship from Learning Dynamics Matters for Generalisation

Jan 16, 2024

Shangmin Guo, Yi Ren, Stefano V. Albrecht, Kenny Smith

Figure 1 for Sample Relationship from Learning Dynamics Matters for Generalisation

Figure 2 for Sample Relationship from Learning Dynamics Matters for Generalisation

Figure 3 for Sample Relationship from Learning Dynamics Matters for Generalisation

Figure 4 for Sample Relationship from Learning Dynamics Matters for Generalisation

Abstract:Although much research has been done on proposing new models or loss functions to improve the generalisation of artificial neural networks (ANNs), less attention has been directed to the impact of the training data on generalisation. In this work, we start from approximating the interaction between samples, i.e. how learning one sample would modify the model's prediction on other samples. Through analysing the terms involved in weight updates in supervised learning, we find that labels influence the interaction between samples. Therefore, we propose the labelled pseudo Neural Tangent Kernel (lpNTK) which takes label information into consideration when measuring the interactions between samples. We first prove that lpNTK asymptotically converges to the empirical neural tangent kernel in terms of the Frobenius norm under certain assumptions. Secondly, we illustrate how lpNTK helps to understand learning phenomena identified in previous work, specifically the learning difficulty of samples and forgetting events during learning. Moreover, we also show that using lpNTK to identify and remove poisoning training samples does not hurt the generalisation performance of ANNs.

* ICLR-2024

Via

Access Paper or Ask Questions

Is Feedback All You Need? Leveraging Natural Language Feedback in Goal-Conditioned Reinforcement Learning

Dec 07, 2023

Sabrina McCallum, Max Taylor-Davies, Stefano V. Albrecht, Alessandro Suglia

Abstract:Despite numerous successes, the field of reinforcement learning (RL) remains far from matching the impressive generalisation power of human behaviour learning. One possible way to help bridge this gap be to provide RL agents with richer, more human-like feedback expressed in natural language. To investigate this idea, we first extend BabyAI to automatically generate language feedback from the environment dynamics and goal condition success. Then, we modify the Decision Transformer architecture to take advantage of this additional signal. We find that training with language feedback either in place of or in addition to the return-to-go or goal descriptions improves agents' generalisation performance, and that agents can benefit from feedback even when this is only available during training, but not at inference.

* Accepted at Workshop on Goal-conditioned Reinforcement Learning, NeurIPS 2023

Via

Access Paper or Ask Questions

Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning

Oct 09, 2023

Trevor McInroe, Stefano V. Albrecht, Amos Storkey

Figure 1 for Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning

Figure 2 for Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning

Figure 3 for Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning

Figure 4 for Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning

Abstract:Offline pretraining with a static dataset followed by online fine-tuning (offline-to-online, or OtO) is a paradigm that is well matched to a real-world RL deployment process: in few real settings would one deploy an offline policy with no test runs and tuning. In this scenario, we aim to find the best-performing policy within a limited budget of online interactions. Previous work in the OtO setting has focused on correcting for bias introduced by the policy-constraint mechanisms of offline RL algorithms. Such constraints keep the learned policy close to the behavior policy that collected the dataset, but this unnecessarily limits policy performance if the behavior policy is far from optimal. Instead, we forgo policy constraints and frame OtO RL as an exploration problem: we must maximize the benefit of the online data-collection. We study major online RL exploration paradigms, adapting them to work well with the OtO setting. These adapted methods contribute several strong baselines. Also, we introduce an algorithm for planning to go out of distribution (PTGOOD), which targets online exploration in relatively high-reward regions of the state-action space unlikely to be visited by the behavior policy. By leveraging concepts from the Conditional Entropy Bottleneck, PTGOOD encourages data collected online to provide new information relevant to improving the final deployment policy. In that way the limited interaction budget is used effectively. We show that PTGOOD significantly improves agent returns during online fine-tuning and finds the optimal policy in as few as 10k online steps in Walker and in as few as 50k in complex control tasks like Humanoid. Also, we find that PTGOOD avoids the suboptimal policy convergence that many of our baselines exhibit in several environments.

* 9 pages, 12 figures, preprint

Via

Access Paper or Ask Questions

How the level sampling process impacts zero-shot generalisation in deep reinforcement learning

Oct 05, 2023

Samuel Garcin, James Doran, Shangmin Guo, Christopher G. Lucas, Stefano V. Albrecht

Figure 1 for How the level sampling process impacts zero-shot generalisation in deep reinforcement learning

Figure 2 for How the level sampling process impacts zero-shot generalisation in deep reinforcement learning

Figure 3 for How the level sampling process impacts zero-shot generalisation in deep reinforcement learning

Figure 4 for How the level sampling process impacts zero-shot generalisation in deep reinforcement learning

Abstract:A key limitation preventing the wider adoption of autonomous agents trained via deep reinforcement learning (RL) is their limited ability to generalise to new environments, even when these share similar characteristics with environments encountered during training. In this work, we investigate how a non-uniform sampling strategy of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents, considering two failure modes: overfitting and over-generalisation. As a first step, we measure the mutual information (MI) between the agent's internal representation and the set of training levels, which we find to be well-correlated to instance overfitting. In contrast to uniform sampling, adaptive sampling strategies prioritising levels based on their value loss are more effective at maintaining lower MI, which provides a novel theoretical justification for this class of techniques. We then turn our attention to unsupervised environment design (UED) methods, which adaptively generate new training levels and minimise MI more effectively than methods sampling from a fixed set. However, we find UED methods significantly shift the training distribution, resulting in over-generalisation and worse ZSG performance over the distribution of interest. To prevent both instance overfitting and over-generalisation, we introduce self-supervised environment design (SSED). SSED generates levels using a variational autoencoder, effectively reducing MI while minimising the shift with the distribution of interest, and leads to statistically significant improvements in ZSG over fixed-set level sampling strategies and UED methods.

* Currently under review, 9 pages

Via

Access Paper or Ask Questions

Contextual Pre-Planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning

Jul 11, 2023

Guy Azran, Mohamad H. Danesh, Stefano V. Albrecht, Sarah Keren

Abstract:Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RM), state machine abstractions that induce subtasks based on the current task's rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Our empirical evaluation shows that our representations improve sample efficiency and few-shot transfer in a variety of domains.

* IJCAI Workshop on Planning and Reinforcement Learning, 2023

Via

Access Paper or Ask Questions

Conditional Mutual Information for Disentangled Representations in Reinforcement Learning

May 23, 2023

Mhairi Dunion, Trevor McInroe, Kevin Sebastian Luck, Josiah P. Hanna, Stefano V. Albrecht

Figure 1 for Conditional Mutual Information for Disentangled Representations in Reinforcement Learning

Figure 2 for Conditional Mutual Information for Disentangled Representations in Reinforcement Learning

Figure 3 for Conditional Mutual Information for Disentangled Representations in Reinforcement Learning

Figure 4 for Conditional Mutual Information for Disentangled Representations in Reinforcement Learning

Abstract:Reinforcement Learning (RL) environments can produce training data with spurious correlations between features due to the amount of training data or its limited feature coverage. This can lead to RL agents encoding these misleading correlations in their latent representation, preventing the agent from generalising if the correlation changes within the environment or when deployed in the real world. Disentangled representations can improve robustness, but existing disentanglement techniques that minimise mutual information between features require independent features, thus they cannot disentangle correlated features. We propose an auxiliary task for RL algorithms that learns a disentangled representation of high-dimensional observations with correlated features by minimising the conditional mutual information between features in the representation. We demonstrate experimentally, using continuous control tasks, that our approach improves generalisation under correlation shifts, as well as improving the training performance of RL algorithms in the presence of correlated features.

Via

Access Paper or Ask Questions

SMAClite: A Lightweight Environment for Multi-Agent Reinforcement Learning

May 09, 2023

Adam Michalski, Filippos Christianos, Stefano V. Albrecht

Figure 1 for SMAClite: A Lightweight Environment for Multi-Agent Reinforcement Learning

Figure 2 for SMAClite: A Lightweight Environment for Multi-Agent Reinforcement Learning

Figure 3 for SMAClite: A Lightweight Environment for Multi-Agent Reinforcement Learning

Figure 4 for SMAClite: A Lightweight Environment for Multi-Agent Reinforcement Learning

Abstract:There is a lack of standard benchmarks for Multi-Agent Reinforcement Learning (MARL) algorithms. The Starcraft Multi-Agent Challenge (SMAC) has been widely used in MARL research, but is built on top of a heavy, closed-source computer game, StarCraft II. Thus, SMAC is computationally expensive and requires knowledge and the use of proprietary tools specific to the game for any meaningful alteration or contribution to the environment. We introduce SMAClite -- a challenge based on SMAC that is both decoupled from Starcraft II and open-source, along with a framework which makes it possible to create new content for SMAClite without any special knowledge. We conduct experiments to show that SMAClite is equivalent to SMAC, by training MARL algorithms on SMAClite and reproducing SMAC results. We then show that SMAClite outperforms SMAC in both runtime speed and memory.

Via

Access Paper or Ask Questions

Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments

Apr 18, 2023

Alain Andres, Lukas Schäfer, Esther Villar-Rodriguez, Stefano V. Albrecht, Javier Del Ser

Figure 1 for Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments

Figure 2 for Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments

Figure 3 for Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments

Figure 4 for Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments

Abstract:One of the key challenges of Reinforcement Learning (RL) is the ability of agents to generalise their learned policy to unseen settings. Moreover, training RL agents requires large numbers of interactions with the environment. Motivated by the recent success of Offline RL and Imitation Learning (IL), we conduct a study to investigate whether agents can leverage offline data in the form of trajectories to improve the sample-efficiency in procedurally generated environments. We consider two settings of using IL from offline data for RL: (1) pre-training a policy before online RL training and (2) concurrently training a policy with online RL and IL from offline data. We analyse the impact of the quality (optimality of trajectories) and diversity (number of trajectories and covered level) of available offline trajectories on the effectiveness of both approaches. Across four well-known sparse reward tasks in the MiniGrid environment, we find that using IL for pre-training and concurrently during online RL training both consistently improve the sample-efficiency while converging to optimal policies. Furthermore, we show that pre-training a policy from as few as two trajectories can make the difference between learning an optimal policy at the end of online training and not learning at all. Our findings motivate the widespread adoption of IL for pre-training and concurrent IL in procedurally generated environments whenever offline trajectories are available or can be generated.

* Presented at the Adaptive and Learning Agents Workshop (ALA) at the AAMAS conference 2023

Via

Access Paper or Ask Questions