Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stephen Roberts

Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment

Apr 12, 2021

Philip J. Ball, Cong Lu, Jack Parker-Holder, Stephen Roberts

Figure 1 for Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment

Figure 2 for Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment

Figure 3 for Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment

Figure 4 for Augmented World Models Facilitate Zero-Shot Dynamics Generalization From a Single Offline Environment

Abstract:Reinforcement learning from large-scale offline datasets provides us with the ability to learn policies without potentially unsafe or impractical exploration. Significant progress has been made in the past few years in dealing with the challenge of correcting for differing behavior between the data collection and learned policies. However, little attention has been paid to potentially changing dynamics when transferring a policy to the online setting, where performance can be up to 90% reduced for existing methods. In this paper we address this problem with Augmented World Models (AugWM). We augment a learned dynamics model with simple transformations that seek to capture potential changes in physical properties of the robot, leading to more robust policies. We not only train our policy in this new setting, but also provide it with the sampled augmentation as a context, allowing it to adapt to changes in the environment. At test time we learn the context in a self-supervised fashion by approximating the augmentation which corresponds to the new environment. We rigorously evaluate our approach on over 100 different changed dynamics settings, and show that this simple approach can significantly improve the zero-shot generalization of a recent state-of-the-art baseline, often achieving successful policies where the baseline fails.

* To be presented as a Spotlight at the "Self-Supervision for Reinforcement Learning Workshop" @ ICLR 2021

Via

Access Paper or Ask Questions

Adversarial Robustness Guarantees for Gaussian Processes

Apr 07, 2021

Andrea Patane, Arno Blaas, Luca Laurenti, Luca Cardelli, Stephen Roberts, Marta Kwiatkowska

Figure 1 for Adversarial Robustness Guarantees for Gaussian Processes

Figure 2 for Adversarial Robustness Guarantees for Gaussian Processes

Figure 3 for Adversarial Robustness Guarantees for Gaussian Processes

Figure 4 for Adversarial Robustness Guarantees for Gaussian Processes

Abstract:Gaussian processes (GPs) enable principled computation of model uncertainty, making them attractive for safety-critical applications. Such scenarios demand that GP decisions are not only accurate, but also robust to perturbations. In this paper we present a framework to analyse adversarial robustness of GPs, defined as invariance of the model's decision to bounded perturbations. Given a compact subset of the input space $T\subseteq \mathbb{R}^d$, a point $x^*$ and a GP, we provide provable guarantees of adversarial robustness of the GP by computing lower and upper bounds on its prediction range in $T$. We develop a branch-and-bound scheme to refine the bounds and show, for any $\epsilon > 0$, that our algorithm is guaranteed to converge to values $\epsilon$-close to the actual values in finitely many iterations. The algorithm is anytime and can handle both regression and classification tasks, with analytical formulation for most kernels used in practice. We evaluate our methods on a collection of synthetic and standard benchmark datasets, including SPAM, MNIST and FashionMNIST. We study the effect of approximate inference techniques on robustness and demonstrate how our method can be used for interpretability. Our empirical results suggest that the adversarial robustness of GPs increases with accurate posterior estimation.

* Submitted for publication

Via

Access Paper or Ask Questions

Building Cross-Sectional Systematic Strategies By Learning to Rank

Dec 13, 2020

Daniel Poh, Bryan Lim, Stefan Zohren, Stephen Roberts

Abstract:The success of a cross-sectional systematic strategy depends critically on accurately ranking assets prior to portfolio construction. Contemporary techniques perform this ranking step either with simple heuristics or by sorting outputs from standard regression or classification models, which have been demonstrated to be sub-optimal for ranking in other domains (e.g. information retrieval). To address this deficiency, we propose a framework to enhance cross-sectional portfolios by incorporating learning-to-rank algorithms, which lead to improvements of ranking accuracy by learning pairwise and listwise structures across instruments. Using cross-sectional momentum as a demonstrative case study, we show that the use of modern machine learning ranking algorithms can substantially improve the trading performance of cross-sectional strategies -- providing approximately threefold boosting of Sharpe Ratios compared to traditional approaches.

* 12 pages, 3 figures

Via

Access Paper or Ask Questions

Explaining the Adaptive Generalisation Gap

Nov 15, 2020

Diego Granziol, Samuel Albanie, Xingchen Wan, Stephen Roberts

Figure 1 for Explaining the Adaptive Generalisation Gap

Figure 2 for Explaining the Adaptive Generalisation Gap

Figure 3 for Explaining the Adaptive Generalisation Gap

Figure 4 for Explaining the Adaptive Generalisation Gap

Abstract:We conjecture that the reason for the difference in generalisation between adaptive and non adaptive gradient methods stems from the failure of adaptive methods to account for the greater levels of noise associated with flatter directions in their estimates of local curvature. This conjecture motivated by results in random matrix theory has implications for optimisation in both simple convex settings and deep neural networks. We demonstrate that typical schedules used for adaptive methods (with low numerical stability or damping constants) serve to bias relative movement towards flat directions relative to sharp directions, effectively amplifying the noise-to-signal ratio and harming generalisation. We show that the numerical stability/damping constant used in these methods can be decomposed into a learning rate reduction and linear shrinkage of the estimated curvature matrix. We then demonstrate significant generalisation improvements by increasing the shrinkage coefficient, closing the generalisation gap entirely in our neural network experiments. Finally, we show that other popular modifications to adaptive methods, such as decoupled weight decay and partial adaptivity can be shown to calibrate parameter updates to make better use of sharper, more reliable directions.

Via

Access Paper or Ask Questions

SafePILCO: a software tool for safe and data-efficient policy synthesis

Aug 07, 2020

Kyriakos Polymenakos, Nikitas Rontsis, Alessandro Abate, Stephen Roberts

Figure 1 for SafePILCO: a software tool for safe and data-efficient policy synthesis

Figure 2 for SafePILCO: a software tool for safe and data-efficient policy synthesis

Figure 3 for SafePILCO: a software tool for safe and data-efficient policy synthesis

Figure 4 for SafePILCO: a software tool for safe and data-efficient policy synthesis

Abstract:SafePILCO is a software tool for safe and data-efficient policy search with reinforcement learning. It extends the known PILCO algorithm, originally written in MATLAB, to support safe learning. We provide a Python implementation and leverage existing libraries that allow the codebase to remain short and modular, which is appropriate for wider use by the verification, reinforcement learning, and control communities.

* Shorter Version published as a software tool demonstration at QEST 2020

Via

Access Paper or Ask Questions

Explicit Regularisation in Gaussian Noise Injections

Jul 14, 2020

Alexander Camuto, Matthew Willetts, Umut Şimşekli, Stephen Roberts, Chris Holmes

Figure 1 for Explicit Regularisation in Gaussian Noise Injections

Figure 2 for Explicit Regularisation in Gaussian Noise Injections

Figure 3 for Explicit Regularisation in Gaussian Noise Injections

Figure 4 for Explicit Regularisation in Gaussian Noise Injections

Abstract:We study the regularisation induced in neural networks by Gaussian noise injections (GNIs). Though such injections have been extensively studied when applied to data, there have been few studies on understanding the regularising effect they induce when applied to network activations. Here we derive the explicit regulariser of GNIs, obtained by marginalising out the injected noise, and show that it is a form of Tikhonov regularisation which penalises functions with high-frequency components in the Fourier domain. We show analytically and empirically that such regularisation produces calibrated classifiers with large classification margins and that the explicit regulariser we derive is able to reproduce these effects.

* 10 Pages

Via

Access Paper or Ask Questions

Towards a Theoretical Understanding of the Robustness of Variational Autoencoders

Jul 14, 2020

Alexander Camuto, Matthew Willetts, Stephen Roberts, Chris Holmes, Tom Rainforth

Figure 1 for Towards a Theoretical Understanding of the Robustness of Variational Autoencoders

Figure 2 for Towards a Theoretical Understanding of the Robustness of Variational Autoencoders

Figure 3 for Towards a Theoretical Understanding of the Robustness of Variational Autoencoders

Figure 4 for Towards a Theoretical Understanding of the Robustness of Variational Autoencoders

Abstract:We make inroads into understanding the robustness of Variational Autoencoders (VAEs) to adversarial attacks and other input perturbations. While previous work has developed algorithmic approaches to attacking and defending VAEs, there remains a lack of formalization for what it means for a VAE to be robust. To address this, we develop a novel criterion for robustness in probabilistic models: $r$-robustness. We then use this to construct the first theoretical results for the robustness of VAEs, deriving margins in the input space for which we can provide guarantees about the resulting reconstruction. Informally, we are able to define a region within which any perturbation will produce a reconstruction that is similar to the original reconstruction. To support our analysis, we show that VAEs trained using disentangling methods not only score well under our robustness metrics, but that the reasons for this can be interpreted through our theoretical results.

* 10 Pages

Via

Access Paper or Ask Questions

Relaxed-Responsibility Hierarchical Discrete VAEs

Jul 14, 2020

Matthew Willetts, Xenia Miscouridou, Stephen Roberts, Chris Holmes

Figure 1 for Relaxed-Responsibility Hierarchical Discrete VAEs

Figure 2 for Relaxed-Responsibility Hierarchical Discrete VAEs

Figure 3 for Relaxed-Responsibility Hierarchical Discrete VAEs

Figure 4 for Relaxed-Responsibility Hierarchical Discrete VAEs

Abstract:Successfully training Variational Autoencoders (VAEs) with a hierarchy of discrete latent variables remains an area of active research. Leveraging insights from classical methods of inference we introduce $\textit{Relaxed-Responsibility Vector-Quantisation}$, a novel way to parameterise discrete latent variables, a refinement of relaxed Vector-Quantisation. This enables a novel approach to hierarchical discrete variational autoencoder with numerous layers of latent variables that we train end-to-end. Unlike discrete VAEs with a single layer of latent variables, we can produce realistic-looking samples by ancestral sampling: it is not essential to train a second generative model over the learnt latent representations to then sample from and then decode. Further, we observe different layers of our model become associated with different aspects of the data.

* 10 Pages

Via

Access Paper or Ask Questions

On Optimism in Model-Based Reinforcement Learning

Jun 21, 2020

Aldo Pacchiano, Philip Ball, Jack Parker-Holder, Krzysztof Choromanski, Stephen Roberts

Figure 1 for On Optimism in Model-Based Reinforcement Learning

Figure 2 for On Optimism in Model-Based Reinforcement Learning

Figure 3 for On Optimism in Model-Based Reinforcement Learning

Figure 4 for On Optimism in Model-Based Reinforcement Learning

Abstract:The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL), often coming with strong theoretical guarantees. However, it remains a challenge to scale these approaches to the deep RL paradigm, which has achieved a great deal of attention in recent years. In this paper, we introduce a tractable approach to optimism via noise augmented Markov Decision Processes (MDPs), which we show can obtain a competitive regret bound: $\tilde{\mathcal{O}}( |\mathcal{S}|H\sqrt{|\mathcal{S}||\mathcal{A}| T } )$ when augmenting using Gaussian noise, where $T$ is the total number of environment steps. This tractability allows us to apply our approach to the deep RL setting, where we rigorously evaluate the key factors for success of optimistic model-based RL algorithms, bridging the gap between theory and practice.

Via

Access Paper or Ask Questions

Deep Learning for Portfolio Optimisation

May 27, 2020

Zihao Zhang, Stefan Zohren, Stephen Roberts

Figure 1 for Deep Learning for Portfolio Optimisation

Figure 2 for Deep Learning for Portfolio Optimisation

Figure 3 for Deep Learning for Portfolio Optimisation

Figure 4 for Deep Learning for Portfolio Optimisation

Abstract:We adopt deep learning models to directly optimise the portfolio Sharpe ratio. The framework we present circumvents the requirements for forecasting expected returns and allows us to directly optimise portfolio weights by updating model parameters. Instead of selecting individual assets, we trade Exchange-Traded Funds (ETFs) of market indices to form a portfolio. Indices of different asset classes show robust correlations and trading them substantially reduces the spectrum of available assets to choose from. We compare our method with a wide range of algorithms with results showing that our model obtains the best performance over the testing period, from 2011 to the end of April 2020, including the financial instabilities of the first quarter of 2020. A sensitivity analysis is included to understand the relevance of input features and we further study the performance of our approach under different cost rates and different risk levels via volatility scaling.

* 12 pages, 6 figures

Via

Access Paper or Ask Questions