Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stephen Roberts

Population-based Global Optimisation Methods for Learning Long-term Dependencies with RNNs

May 23, 2019

Bryan Lim, Stefan Zohren, Stephen Roberts

Figure 1 for Population-based Global Optimisation Methods for Learning Long-term Dependencies with RNNs

Abstract:Despite recent innovations in network architectures and loss functions, training RNNs to learn long-term dependencies remains difficult due to challenges with gradient-based optimisation methods. Inspired by the success of Deep Neuroevolution in reinforcement learning (Such et al. 2017), we explore the use of gradient-free population-based global optimisation (PBO) techniques -- training RNNs to capture long-term dependencies in time-series data. Testing evolution strategies (ES) and particle swarm optimisation (PSO) on an application in volatility forecasting, we demonstrate that PBO methods lead to performance improvements in general, with ES exhibiting the most consistent results across a variety of architectures.

* To appear at ICML 2019 Time Series Workshop

Via

Access Paper or Ask Questions

Enhancing Time Series Momentum Strategies Using Deep Neural Networks

Apr 09, 2019

Bryan Lim, Stefan Zohren, Stephen Roberts

Abstract:While time series momentum is a well-studied phenomenon in finance, common strategies require the explicit definition of both a trend estimator and a position sizing rule. In this paper, we introduce Deep Momentum Networks -- a hybrid approach which injects deep learning based trading rules into the volatility scaling framework of time series momentum. The model also simultaneously learns both trend estimation and position sizing in a data-driven manner, with networks directly trained by optimising the Sharpe ratio of the signal. Backtesting on a portfolio of 88 continuous futures contracts, we demonstrate that the Sharpe-optimised LSTM improved traditional methods by more than two times in the absence of transactions costs, and continue outperforming when considering transaction costs up to 2-3 basis points. To account for more illiquid assets, we also propose a turnover regularisation term which trains the network to factor in costs at run-time.

Via

Access Paper or Ask Questions

A Machine Learning approach to Risk Minimisation in Electricity Markets with Coregionalized Sparse Gaussian Processes

Apr 03, 2019

Daniel Poh, Stephen Roberts, Martin Tegnér

Figure 1 for A Machine Learning approach to Risk Minimisation in Electricity Markets with Coregionalized Sparse Gaussian Processes

Figure 2 for A Machine Learning approach to Risk Minimisation in Electricity Markets with Coregionalized Sparse Gaussian Processes

Figure 3 for A Machine Learning approach to Risk Minimisation in Electricity Markets with Coregionalized Sparse Gaussian Processes

Figure 4 for A Machine Learning approach to Risk Minimisation in Electricity Markets with Coregionalized Sparse Gaussian Processes

Abstract:The non-storability of electricity makes it unique among commodity assets, and it is an important driver of its price behaviour in secondary financial markets. The instantaneous and continuous matching of power supply with demand is a key factor explaining its volatility. During periods of high demand, costlier generation capabilities are utilised since electricity cannot be stored and this has the impact of driving prices up very quickly. Furthermore, the non-storability also complicates physical hedging. Owing to these, the problem of joint price-quantity risk in electricity markets is a commonly studied theme. We propose using Gaussian Processes (GPs) to tackle this problem since GPs provide a versatile and elegant non-parametric approach for regression and time-series modelling. However, GPs scale poorly with the amount of training data due to a cubic complexity. These considerations suggest that knowledge transfer between price and load is vital for effective hedging, and that a computationally efficient method is required. To this end, we use the coregionalized (or multi-task) sparse GPs which addresses the aforementioned issues. To gauge the performance of our model, we use an average-load strategy as comparator. The latter is a robust approach commonly used by industry. If the spot and load are uncorrelated and Gaussian, then hedging with the expected load will result in the minimum variance position. Our main contributions are twofold. Firstly, in developing a coregionalized sparse GP-based approach for hedging. Secondly, in demonstrating that our model-based strategy outperforms the comparator, and can thus be employed for effective hedging in electricity markets.

* 24 pages, 4 figures, journal submission

Via

Access Paper or Ask Questions

WiSE-ALE: Wide Sample Estimator for Approximate Latent Embedding

Mar 18, 2019

Shuyu Lin, Ronald Clark, Robert Birke, Niki Trigoni, Stephen Roberts

Figure 1 for WiSE-ALE: Wide Sample Estimator for Approximate Latent Embedding

Figure 2 for WiSE-ALE: Wide Sample Estimator for Approximate Latent Embedding

Figure 3 for WiSE-ALE: Wide Sample Estimator for Approximate Latent Embedding

Figure 4 for WiSE-ALE: Wide Sample Estimator for Approximate Latent Embedding

Abstract:Variational Auto-encoders (VAEs) have been very successful as methods for forming compressed latent representations of complex, often high-dimensional, data. In this paper, we derive an alternative variational lower bound from the one common in VAEs, which aims to minimize aggregate information loss. Using our lower bound as the objective function for an auto-encoder enables us to place a prior on the bulk statistics, corresponding to an aggregate posterior for the entire dataset, as opposed to a single sample posterior as in the original VAE. This alternative form of prior constraint allows individual posteriors more flexibility to preserve necessary information for good reconstruction quality. We further derive an analytic approximation to our lower bound, leading to an efficient learning algorithm - WiSE-ALE. Through various examples, we demonstrate that WiSE-ALE can reach excellent reconstruction quality in comparison to other state-of-the-art VAE models, while still retaining the ability to learn a smooth, compact representation.

* 18 pages, appendix included

Via

Access Paper or Ask Questions

Recurrent Neural Filters: Learning Independent Bayesian Filtering Steps for Time Series Prediction

Jan 23, 2019

Bryan Lim, Stefan Zohren, Stephen Roberts

Figure 1 for Recurrent Neural Filters: Learning Independent Bayesian Filtering Steps for Time Series Prediction

Figure 2 for Recurrent Neural Filters: Learning Independent Bayesian Filtering Steps for Time Series Prediction

Figure 3 for Recurrent Neural Filters: Learning Independent Bayesian Filtering Steps for Time Series Prediction

Figure 4 for Recurrent Neural Filters: Learning Independent Bayesian Filtering Steps for Time Series Prediction

Abstract:Despite the recent popularity of deep generative state space models, few comparisons have been made between network architectures and the inference steps of the Bayesian filtering framework -- with most models simultaneously approximating both state transition and update steps with a single recurrent neural network (RNN). In this paper, we introduce the Recurrent Neural Filter (RNF), a novel recurrent variational autoencoder architecture that learns distinct representations for each Bayesian filtering step, captured by a series of encoders and decoders. Testing this on three real-world time series datasets, we demonstrate that decoupling representations not only improves the accuracy of one-step-ahead forecasts while providing realistic uncertainty estimates, but also facilitates multistep prediction through the separation of encoder stages.

Via

Access Paper or Ask Questions

Portfolio Optimization for Cointelated Pairs: SDEs vs. Machine Learning

Dec 26, 2018

Babak Mahdavi-Damghani, Konul Mustafayeva, Stephen Roberts, Cristin Buescu

Figure 1 for Portfolio Optimization for Cointelated Pairs: SDEs vs. Machine Learning

Figure 2 for Portfolio Optimization for Cointelated Pairs: SDEs vs. Machine Learning

Figure 3 for Portfolio Optimization for Cointelated Pairs: SDEs vs. Machine Learning

Figure 4 for Portfolio Optimization for Cointelated Pairs: SDEs vs. Machine Learning

Abstract:We investigate the problem of dynamic portfolio optimization in continuous-time, finite-horizon setting for a portfolio of two stocks and one risk-free asset. The stocks follow the Cointelation model. The proposed optimization methods are twofold. In what we call an Stochastic Differential Equation approach, we compute the optimal weights using mean-variance criterion and power utility maximization. We show that dynamically switching between these two optimal strategies by introducing a triggering function can further improve the portfolio returns. We contrast this with the machine learning clustering methodology inspired by the band-wise Gaussian mixture model. The first benefit of the machine learning over the Stochastic Differential Equation approach is that we were able to achieve the same results though a simpler channel. The second advantage is a flexibility to regime change.

Via

Access Paper or Ask Questions

Bayesian deep neural networks for low-cost neurophysiological markers of Alzheimer's disease severity

Dec 13, 2018

Wolfgang Fruehwirt, Adam D. Cobb, Martin Mairhofer, Leonard Weydemann, Heinrich Garn, Reinhold Schmidt, Thomas Benke, Peter Dal-Bianco, Gerhard Ransmayr, Markus Waser(+4 more)

Figure 1 for Bayesian deep neural networks for low-cost neurophysiological markers of Alzheimer's disease severity

Abstract:As societies around the world are ageing, the number of Alzheimer's disease (AD) patients is rapidly increasing. To date, no low-cost, non-invasive biomarkers have been established to advance the objectivization of AD diagnosis and progression assessment. Here, we utilize Bayesian neural networks to develop a multivariate predictor for AD severity using a wide range of quantitative EEG (QEEG) markers. The Bayesian treatment of neural networks both automatically controls model complexity and provides a predictive distribution over the target function, giving uncertainty bounds for our regression task. It is therefore well suited to clinical neuroscience, where data sets are typically sparse and practitioners require a precise assessment of the predictive uncertainty. We use data of one of the largest prospective AD EEG trials ever conducted to demonstrate the potential of Bayesian deep learning in this domain, while comparing two distinct Bayesian neural network approaches, i.e., Monte Carlo dropout and Hamiltonian Monte Carlo.

* Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

Via

Access Paper or Ask Questions

Intersectionality: Multiple Group Fairness in Expectation Constraints

Nov 25, 2018

Jack Fitzsimons, Michael Osborne, Stephen Roberts

Figure 1 for Intersectionality: Multiple Group Fairness in Expectation Constraints

Figure 2 for Intersectionality: Multiple Group Fairness in Expectation Constraints

Figure 3 for Intersectionality: Multiple Group Fairness in Expectation Constraints

Figure 4 for Intersectionality: Multiple Group Fairness in Expectation Constraints

Abstract:Group fairness is an important concern for machine learning researchers, developers, and regulators. However, the strictness to which models must be constrained to be considered fair is still under debate. The focus of this work is on constraining the expected outcome of subpopulations in kernel regression and, in particular, decision tree regression, with application to random forests, boosted trees and other ensemble models. While individual constraints were previously addressed, this work addresses concerns about incorporating multiple constraints simultaneously. The proposed solution does not affect the order of computational or memory complexity of the decision trees and is easily integrated into models post training.

* NeurIPS (previously NIPS) 2018, Workshop on Ethical, Social and Governance Issues in AI

Via

Access Paper or Ask Questions

Practical Bayesian Learning of Neural Networks via Adaptive Subgradient Methods

Nov 08, 2018

Arnold Salas, Stefan Zohren, Stephen Roberts

Figure 1 for Practical Bayesian Learning of Neural Networks via Adaptive Subgradient Methods

Figure 2 for Practical Bayesian Learning of Neural Networks via Adaptive Subgradient Methods

Figure 3 for Practical Bayesian Learning of Neural Networks via Adaptive Subgradient Methods

Figure 4 for Practical Bayesian Learning of Neural Networks via Adaptive Subgradient Methods

Abstract:We introduce a novel framework for the estimation of the posterior distribution of the weights of a neural network, based on a new probabilistic interpretation of adaptive subgradient algorithms such as AdaGrad and Adam. Having a confidence measure of the weights allows several shortcomings of neural networks to be addressed. In particular, the robustness of the network can be improved by performing weight pruning based on signal-to-noise ratios from the weight posterior distribution. Using the MNIST dataset, we demonstrate that the empirical performance of Badam, a particular instance of our framework based on Adam, is competitive in comparison to related Bayesian approaches such as Bayes By Backprop.

* Manuscript under review by AISTATS 2019

Via

Access Paper or Ask Questions

Semi-unsupervised Learning of Human Activity using Deep Generative Models

Oct 29, 2018

Matthew Willetts, Aiden Doherty, Stephen Roberts, Chris Holmes

Figure 1 for Semi-unsupervised Learning of Human Activity using Deep Generative Models

Figure 2 for Semi-unsupervised Learning of Human Activity using Deep Generative Models

Figure 3 for Semi-unsupervised Learning of Human Activity using Deep Generative Models

Abstract:Here we demonstrate a new deep generative model for classification. We introduce `semi-unsupervised learning', a problem regime related to transfer learning and zero/few shot learning where, in the training data, some classes are sparsely labelled and others entirely unlabelled. Models able to learn from training data of this type are potentially of great use, as many medical datasets are `semi-unsupervised'. Our model demonstrates superior semi-unsupervised classification performance on MNIST to model M2 from Kingma and Welling (2014). We apply the model to human accelerometer data, performing activity classification and structure discovery on windows of time series data.

* 4 pages, 2 figures, conference workshop pre-print

Via

Access Paper or Ask Questions