We present a novel approach incorporating transformer-based language models into infectious disease modelling. Text-derived features are quantified by tracking high-density clusters of sentence-level representations of Reddit posts within specific US states' COVID-19 subreddits. We benchmark these clustered embedding features against features extracted from other high-quality datasets. In a threshold-classification task, we show that they outperform all other feature types at predicting upward trend signals, a significant result for infectious disease modelling in areas where epidemiological data is unreliable. Subsequently, in a time-series forecasting task we fully utilise the predictive power of the caseload and compare the relative strengths of using different supplementary datasets as covariate feature sets in a transformer-based time-series model.
Deep learning architectures, specifically Deep Momentum Networks (DMNs) [1904.04912], have been found to be an effective approach to momentum and mean-reversion trading. However, some of the key challenges in recent years involve learning long-term dependencies, degradation of performance when considering returns net of transaction costs and adapting to new market regimes, notably during the SARS-CoV-2 crisis. Attention mechanisms, or Transformer-based architectures, are a solution to such challenges because they allow the network to focus on significant time steps in the past and longer-term patterns. We introduce the Momentum Transformer, an attention-based architecture which outperforms the benchmarks, and is inherently interpretable, providing us with greater insights into our deep learning trading strategy. Our model is an extension to the LSTM-based DMN, which directly outputs position sizing by optimising the network on a risk-adjusted performance metric, such as Sharpe ratio. We find an attention-LSTM hybrid Decoder-Only Temporal Fusion Transformer (TFT) style architecture is the best performing model. In terms of interpretability, we observe remarkable structure in the attention patterns, with significant peaks of importance at momentum turning points. The time series is thus segmented into regimes and the model tends to focus on previous time-steps in alike regimes. We find changepoint detection (CPD) [2105.13727], another technique for responding to regime change, can complement multi-headed attention, especially when we run CPD at multiple timescales. Through the addition of an interpretable variable selection network, we observe how CPD helps our model to move away from trading predominantly on daily returns data. We note that the model can intelligently switch between, and blend, classical strategies - basing its decision on patterns in the data.
We develop FinText, a novel, state-of-the-art, financial word embedding from Dow Jones Newswires Text News Feed Database. Incorporating this word embedding in a machine learning model produces a substantial increase in volatility forecasting performance on days with volatility jumps for 23 NASDAQ stocks from 27 July 2007 to 18 November 2016. A simple ensemble model, combining our word embedding and another machine learning model that uses limit order book data, provides the best forecasting performance for both normal and jump volatility days. Finally, we use Integrated Gradients and SHAP (SHapley Additive exPlanations) to make the results more 'explainable' and the model comparisons more transparent.
Momentum strategies are an important part of alternative investments and are at the heart of commodity trading advisors (CTAs). These strategies have however been found to have difficulties adjusting to rapid changes in market conditions, such as during the 2020 market crash. In particular, immediately after momentum turning points, where a trend reverses from an uptrend (downtrend) to a downtrend (uptrend), time-series momentum (TSMOM) strategies are prone to making bad bets. To improve the response to regime change, we introduce a novel approach, where we insert an online change-point detection (CPD) module into a Deep Momentum Network (DMN) [1904.04912] pipeline, which uses an LSTM deep-learning architecture to simultaneously learn both trend estimation and position sizing. Furthermore, our model is able to optimise the way in which it balances 1) a slow momentum strategy which exploits persisting trends, but does not overreact to localised price moves, and 2) a fast mean-reversion strategy regime by quickly flipping its position, then swapping it back again to exploit localised price moves. Our CPD module outputs a changepoint location and severity score, allowing our model to learn to respond to varying degrees of disequilibrium, or smaller and more localised changepoints, in a data driven manner. Using a portfolio of 50, liquid, continuous futures contracts over the period 1990-2020, the addition of the CPD module leads to an improvement in Sharpe ratio of one-third. Even more notably, this module is especially beneficial in periods of significant nonstationarity, and in particular, over the most recent years tested (2015-2020) the performance boost is approximately two-thirds. This is especially interesting as traditional momentum strategies have been underperforming in this period.
Continual Learning (CL) considers the problem of training an agent sequentially on a set of tasks while seeking to retain performance on all previous tasks. A key challenge in CL is catastrophic forgetting, which arises when performance on a previously mastered task is reduced when learning a new task. While a variety of methods exist to combat forgetting, in some cases tasks are fundamentally incompatible with each other and thus cannot be learnt by a single policy. This can occur, in reinforcement learning (RL) when an agent may be rewarded for achieving different goals from the same observation. In this paper we formalize this ``interference'' as distinct from the problem of forgetting. We show that existing CL methods based on single neural network predictors with shared replay buffers fail in the presence of interference. Instead, we propose a simple method, OWL, to address this challenge. OWL learns a factorized policy, using shared feature extraction layers, but separate heads, each specializing on a new task. The separate heads in OWL are used to prevent interference. At test time, we formulate policy selection as a multi-armed bandit problem, and show it is possible to select the best policy for an unknown task using feedback from the environment. The use of bandit algorithms allows the OWL agent to constructively re-use different continually learnt policies at different times during an episode. We show in multiple RL environments that existing replay based CL methods fail, while OWL is able to achieve close to optimal performance when training sequentially.
We design multi-horizon forecasting models for limit order book (LOB) data by using deep learning techniques. Unlike standard structures where a single prediction is made, we adopt encoder-decoder models with sequence-to-sequence and Attention mechanisms, to generate a forecasting path. Our methods achieve comparable performance to state-of-art algorithms at short prediction horizons. Importantly, they outperform when generating predictions over long horizons by leveraging the multi-horizon setup. Given that encoder-decoder models rely on recurrent neural layers, they generally suffer from a slow training process. To remedy this, we experiment with utilising novel hardware, so-called Intelligent Processing Units (IPUs) produced by Graphcore. IPUs are specifically designed for machine intelligence workload with the aim to speed up the computation process. We show that in our setup this leads to significantly faster training times when compared to training models with GPUs.
The performance of a cross-sectional currency strategy depends crucially on accurately ranking instruments prior to portfolio construction. While this ranking step is traditionally performed using heuristics, or by sorting outputs produced by pointwise regression or classification models, Learning to Rank algorithms have recently presented themselves as competitive and viable alternatives. Despite improving ranking accuracy on average however, these techniques do not account for the possibility that assets positioned at the extreme ends of the ranked list -- which are ultimately used to construct the long/short portfolios -- can assume different distributions in the input space, and thus lead to sub-optimal strategy performance. Drawing from research in Information Retrieval that demonstrates the utility of contextual information embedded within top-ranked documents to learn the query's characteristics to improve ranking, we propose an analogous approach: exploiting the features of both out- and under-performing instruments to learn a model for refining the original ranked list. Under a re-ranking framework, we adapt the Transformer architecture to encode the features of extreme assets for refining our selection of long/short instruments obtained with an initial retrieval. Backtesting on a set of 31 currencies, our proposed methodology significantly boosts Sharpe ratios -- by approximately 20% over the original LTR algorithms and double that of traditional baselines.
Market by order (MBO) data - a detailed feed of individual trade instructions for a given stock on an exchange - is arguably one of the most granular sources of microstructure information. While limit order books (LOBs) are implicitly derived from it, MBO data is largely neglected by current academic literature which focuses primarily on LOB modelling. In this paper, we demonstrate the utility of MBO data for forecasting high-frequency price movements, providing an orthogonal source of information to LOB snapshots. We provide the first predictive analysis on MBO data by carefully introducing the data structure and presenting a specific normalisation scheme to consider level information in order books and to allow model training with multiple instruments. Through forecasting experiments using deep neural networks, we show that while MBO-driven and LOB-driven models individually provide similar performance, ensembles of the two can lead to improvements in forecasting accuracy -- indicating that MBO data is additive to LOB-based features.
The success of a cross-sectional systematic strategy depends critically on accurately ranking assets prior to portfolio construction. Contemporary techniques perform this ranking step either with simple heuristics or by sorting outputs from standard regression or classification models, which have been demonstrated to be sub-optimal for ranking in other domains (e.g. information retrieval). To address this deficiency, we propose a framework to enhance cross-sectional portfolios by incorporating learning-to-rank algorithms, which lead to improvements of ranking accuracy by learning pairwise and listwise structures across instruments. Using cross-sectional momentum as a demonstrative case study, we show that the use of modern machine learning ranking algorithms can substantially improve the trading performance of cross-sectional strategies -- providing approximately threefold boosting of Sharpe Ratios compared to traditional approaches.