Alert button
Picture for Stefan Zohren

Stefan Zohren

Alert button

Forecasting COVID-19 Caseloads Using Unsupervised Embedding Clusters of Social Media Posts

May 20, 2022
Felix Drinkall, Stefan Zohren, Janet B. Pierrehumbert

Figure 1 for Forecasting COVID-19 Caseloads Using Unsupervised Embedding Clusters of Social Media Posts
Figure 2 for Forecasting COVID-19 Caseloads Using Unsupervised Embedding Clusters of Social Media Posts
Figure 3 for Forecasting COVID-19 Caseloads Using Unsupervised Embedding Clusters of Social Media Posts
Figure 4 for Forecasting COVID-19 Caseloads Using Unsupervised Embedding Clusters of Social Media Posts

We present a novel approach incorporating transformer-based language models into infectious disease modelling. Text-derived features are quantified by tracking high-density clusters of sentence-level representations of Reddit posts within specific US states' COVID-19 subreddits. We benchmark these clustered embedding features against features extracted from other high-quality datasets. In a threshold-classification task, we show that they outperform all other feature types at predicting upward trend signals, a significant result for infectious disease modelling in areas where epidemiological data is unreliable. Subsequently, in a time-series forecasting task we fully utilise the predictive power of the caseload and compare the relative strengths of using different supplementary datasets as covariate feature sets in a transformer-based time-series model.

* NAACL 2022 
Viaarxiv icon

Trading with the Momentum Transformer: An Intelligent and Interpretable Architecture

Dec 16, 2021
Kieran Wood, Sven Giegerich, Stephen Roberts, Stefan Zohren

Deep learning architectures, specifically Deep Momentum Networks (DMNs) [1904.04912], have been found to be an effective approach to momentum and mean-reversion trading. However, some of the key challenges in recent years involve learning long-term dependencies, degradation of performance when considering returns net of transaction costs and adapting to new market regimes, notably during the SARS-CoV-2 crisis. Attention mechanisms, or Transformer-based architectures, are a solution to such challenges because they allow the network to focus on significant time steps in the past and longer-term patterns. We introduce the Momentum Transformer, an attention-based architecture which outperforms the benchmarks, and is inherently interpretable, providing us with greater insights into our deep learning trading strategy. Our model is an extension to the LSTM-based DMN, which directly outputs position sizing by optimising the network on a risk-adjusted performance metric, such as Sharpe ratio. We find an attention-LSTM hybrid Decoder-Only Temporal Fusion Transformer (TFT) style architecture is the best performing model. In terms of interpretability, we observe remarkable structure in the attention patterns, with significant peaks of importance at momentum turning points. The time series is thus segmented into regimes and the model tends to focus on previous time-steps in alike regimes. We find changepoint detection (CPD) [2105.13727], another technique for responding to regime change, can complement multi-headed attention, especially when we run CPD at multiple timescales. Through the addition of an interpretable variable selection network, we observe how CPD helps our model to move away from trading predominantly on daily returns data. We note that the model can intelligently switch between, and blend, classical strategies - basing its decision on patterns in the data.

Viaarxiv icon

Realised Volatility Forecasting: Machine Learning via Financial Word Embedding

Aug 01, 2021
Eghbal Rahimikia, Stefan Zohren, Ser-Huang Poon

We develop FinText, a novel, state-of-the-art, financial word embedding from Dow Jones Newswires Text News Feed Database. Incorporating this word embedding in a machine learning model produces a substantial increase in volatility forecasting performance on days with volatility jumps for 23 NASDAQ stocks from 27 July 2007 to 18 November 2016. A simple ensemble model, combining our word embedding and another machine learning model that uses limit order book data, provides the best forecasting performance for both normal and jump volatility days. Finally, we use Integrated Gradients and SHAP (SHapley Additive exPlanations) to make the results more 'explainable' and the model comparisons more transparent.

Viaarxiv icon

Slow Momentum with Fast Reversion: A Trading Strategy Using Deep Learning and Changepoint Detection

Jun 18, 2021
Kieran Wood, Stephen Roberts, Stefan Zohren

Momentum strategies are an important part of alternative investments and are at the heart of commodity trading advisors (CTAs). These strategies have however been found to have difficulties adjusting to rapid changes in market conditions, such as during the 2020 market crash. In particular, immediately after momentum turning points, where a trend reverses from an uptrend (downtrend) to a downtrend (uptrend), time-series momentum (TSMOM) strategies are prone to making bad bets. To improve the response to regime change, we introduce a novel approach, where we insert an online change-point detection (CPD) module into a Deep Momentum Network (DMN) [1904.04912] pipeline, which uses an LSTM deep-learning architecture to simultaneously learn both trend estimation and position sizing. Furthermore, our model is able to optimise the way in which it balances 1) a slow momentum strategy which exploits persisting trends, but does not overreact to localised price moves, and 2) a fast mean-reversion strategy regime by quickly flipping its position, then swapping it back again to exploit localised price moves. Our CPD module outputs a changepoint location and severity score, allowing our model to learn to respond to varying degrees of disequilibrium, or smaller and more localised changepoints, in a data driven manner. Using a portfolio of 50, liquid, continuous futures contracts over the period 1990-2020, the addition of the CPD module leads to an improvement in Sharpe ratio of one-third. Even more notably, this module is especially beneficial in periods of significant nonstationarity, and in particular, over the most recent years tested (2015-2020) the performance boost is approximately two-thirds. This is especially interesting as traditional momentum strategies have been underperforming in this period.

* minor corrections, strategy comparison for 2015-2020 made more robust by repeated trials 
Viaarxiv icon

Same State, Different Task: Continual Reinforcement Learning without Interference

Jun 05, 2021
Samuel Kessler, Jack Parker-Holder, Philip Ball, Stefan Zohren, Stephen J. Roberts

Figure 1 for Same State, Different Task: Continual Reinforcement Learning without Interference
Figure 2 for Same State, Different Task: Continual Reinforcement Learning without Interference
Figure 3 for Same State, Different Task: Continual Reinforcement Learning without Interference
Figure 4 for Same State, Different Task: Continual Reinforcement Learning without Interference

Continual Learning (CL) considers the problem of training an agent sequentially on a set of tasks while seeking to retain performance on all previous tasks. A key challenge in CL is catastrophic forgetting, which arises when performance on a previously mastered task is reduced when learning a new task. While a variety of methods exist to combat forgetting, in some cases tasks are fundamentally incompatible with each other and thus cannot be learnt by a single policy. This can occur, in reinforcement learning (RL) when an agent may be rewarded for achieving different goals from the same observation. In this paper we formalize this ``interference'' as distinct from the problem of forgetting. We show that existing CL methods based on single neural network predictors with shared replay buffers fail in the presence of interference. Instead, we propose a simple method, OWL, to address this challenge. OWL learns a factorized policy, using shared feature extraction layers, but separate heads, each specializing on a new task. The separate heads in OWL are used to prevent interference. At test time, we formulate policy selection as a multi-armed bandit problem, and show it is possible to select the best policy for an unknown task using feedback from the environment. The use of bandit algorithms allows the OWL agent to constructively re-use different continually learnt policies at different times during an episode. We show in multiple RL environments that existing replay based CL methods fail, while OWL is able to achieve close to optimal performance when training sequentially.

* 20 pages, 12 figures 
Viaarxiv icon

Multi-Horizon Forecasting for Limit Order Books: Novel Deep Learning Approaches and Hardware Acceleration using Intelligent Processing Units

May 21, 2021
Zihao Zhang, Stefan Zohren

Figure 1 for Multi-Horizon Forecasting for Limit Order Books: Novel Deep Learning Approaches and Hardware Acceleration using Intelligent Processing Units
Figure 2 for Multi-Horizon Forecasting for Limit Order Books: Novel Deep Learning Approaches and Hardware Acceleration using Intelligent Processing Units
Figure 3 for Multi-Horizon Forecasting for Limit Order Books: Novel Deep Learning Approaches and Hardware Acceleration using Intelligent Processing Units
Figure 4 for Multi-Horizon Forecasting for Limit Order Books: Novel Deep Learning Approaches and Hardware Acceleration using Intelligent Processing Units

We design multi-horizon forecasting models for limit order book (LOB) data by using deep learning techniques. Unlike standard structures where a single prediction is made, we adopt encoder-decoder models with sequence-to-sequence and Attention mechanisms, to generate a forecasting path. Our methods achieve comparable performance to state-of-art algorithms at short prediction horizons. Importantly, they outperform when generating predictions over long horizons by leveraging the multi-horizon setup. Given that encoder-decoder models rely on recurrent neural layers, they generally suffer from a slow training process. To remedy this, we experiment with utilising novel hardware, so-called Intelligent Processing Units (IPUs) produced by Graphcore. IPUs are specifically designed for machine intelligence workload with the aim to speed up the computation process. We show that in our setup this leads to significantly faster training times when compared to training models with GPUs.

* 12 pages, 6 figures, and 2 tables 
Viaarxiv icon

Enhancing Cross-Sectional Currency Strategies by Ranking Refinement with Transformer-based Architectures

May 20, 2021
Daniel Poh, Bryan Lim, Stefan Zohren, Stephen Roberts

Figure 1 for Enhancing Cross-Sectional Currency Strategies by Ranking Refinement with Transformer-based Architectures
Figure 2 for Enhancing Cross-Sectional Currency Strategies by Ranking Refinement with Transformer-based Architectures
Figure 3 for Enhancing Cross-Sectional Currency Strategies by Ranking Refinement with Transformer-based Architectures
Figure 4 for Enhancing Cross-Sectional Currency Strategies by Ranking Refinement with Transformer-based Architectures

The performance of a cross-sectional currency strategy depends crucially on accurately ranking instruments prior to portfolio construction. While this ranking step is traditionally performed using heuristics, or by sorting outputs produced by pointwise regression or classification models, Learning to Rank algorithms have recently presented themselves as competitive and viable alternatives. Despite improving ranking accuracy on average however, these techniques do not account for the possibility that assets positioned at the extreme ends of the ranked list -- which are ultimately used to construct the long/short portfolios -- can assume different distributions in the input space, and thus lead to sub-optimal strategy performance. Drawing from research in Information Retrieval that demonstrates the utility of contextual information embedded within top-ranked documents to learn the query's characteristics to improve ranking, we propose an analogous approach: exploiting the features of both out- and under-performing instruments to learn a model for refining the original ranked list. Under a re-ranking framework, we adapt the Transformer architecture to encode the features of extreme assets for refining our selection of long/short instruments obtained with an initial retrieval. Backtesting on a set of 31 currencies, our proposed methodology significantly boosts Sharpe ratios -- by approximately 20% over the original LTR algorithms and double that of traditional baselines.

* 7 pages, 3 figures 
Viaarxiv icon

Deep Learning for Market by Order Data

Feb 17, 2021
Zihao Zhang, Bryan Lim, Stefan Zohren

Figure 1 for Deep Learning for Market by Order Data
Figure 2 for Deep Learning for Market by Order Data
Figure 3 for Deep Learning for Market by Order Data
Figure 4 for Deep Learning for Market by Order Data

Market by order (MBO) data - a detailed feed of individual trade instructions for a given stock on an exchange - is arguably one of the most granular sources of microstructure information. While limit order books (LOBs) are implicitly derived from it, MBO data is largely neglected by current academic literature which focuses primarily on LOB modelling. In this paper, we demonstrate the utility of MBO data for forecasting high-frequency price movements, providing an orthogonal source of information to LOB snapshots. We provide the first predictive analysis on MBO data by carefully introducing the data structure and presenting a specific normalisation scheme to consider level information in order books and to allow model training with multiple instruments. Through forecasting experiments using deep neural networks, we show that while MBO-driven and LOB-driven models individually provide similar performance, ensembles of the two can lead to improvements in forecasting accuracy -- indicating that MBO data is additive to LOB-based features.

* 16 pages, 6 figures 
Viaarxiv icon

Building Cross-Sectional Systematic Strategies By Learning to Rank

Dec 13, 2020
Daniel Poh, Bryan Lim, Stefan Zohren, Stephen Roberts

The success of a cross-sectional systematic strategy depends critically on accurately ranking assets prior to portfolio construction. Contemporary techniques perform this ranking step either with simple heuristics or by sorting outputs from standard regression or classification models, which have been demonstrated to be sub-optimal for ranking in other domains (e.g. information retrieval). To address this deficiency, we propose a framework to enhance cross-sectional portfolios by incorporating learning-to-rank algorithms, which lead to improvements of ranking accuracy by learning pairwise and listwise structures across instruments. Using cross-sectional momentum as a demonstrative case study, we show that the use of modern machine learning ranking algorithms can substantially improve the trading performance of cross-sectional strategies -- providing approximately threefold boosting of Sharpe Ratios compared to traditional approaches.

* 12 pages, 3 figures 
Viaarxiv icon