Get our free extension to see links to code for papers anywhere online!

# "Time Series Analysis": models, code, and papers

## Correlated daily time series and forecasting in the M4 competition

We participated in the M4 competition for time series forecasting and describe here our methods for forecasting daily time series. We used an ensemble of five statistical forecasting methods and a method that we refer to as the correlator. Our retrospective analysis using the ground truth values published by the M4 organisers after the competition demonstrates that the correlator was responsible for most of our gains over the naive constant forecasting method. We identify data leakage as one reason for its success, partly due to test data selected from different time intervals, and partly due to quality issues in the original time series. We suggest that future forecasting competitions should provide actual dates for the time series so that some of those leakages could be avoided by the participants.

* International Journal of Forecasting, 36(1), 121-128 (2020)

## Correlated daily time series and forecasting in M4 competition

We participated in the M4 competition for time series forecasting and describe here our methods for forecasting daily time series. We used an ensemble of five statistical forecasting methods and a method that we refer to as the correlator. Our retrospective analysis using the ground truth values published by the M4 organisers after the competition demonstrates that the correlator was responsible for most of our gains over the naive constant forecasting method. We identify data leakage as one reason for its success, partly due to test data selected from different time intervals, and partly due to quality issues in the original time series. We suggest future forecasting competitions to provide actual dates for the time series so that some of those leakages could be avoided by the participants.

* International Journal of Forecasting, 36(1), 121-128 (2020)

## Time-Series Analysis via Low-Rank Matrix Factorization Applied to Infant-Sleep Data

We propose a nonparametric model for time series with missing data based on low-rank matrix factorization. The model expresses each instance in a set of time series as a linear combination of a small number of shared basis functions. Constraining the functions and the corresponding coefficients to be nonnegative yields an interpretable low-dimensional representation of the data. A time-smoothing regularization term ensures that the model captures meaningful trends in the data, instead of overfitting short-term fluctuations. The low-dimensional representation makes it possible to detect outliers and cluster the time series according to the interpretable features extracted by the model, and also to perform forecasting via kernel regression. We apply our methodology to a large real-world dataset of infant-sleep data gathered by caregivers with a mobile-phone app. Our analysis automatically extracts daily-sleep patterns consistent with the existing literature. This allows us to compute sleep-development trends for the cohort, which characterize the emergence of circadian sleep and different napping habits. We apply our methodology to detect anomalous individuals, to cluster the cohort into groups with different sleeping tendencies, and to obtain improved predictions of future sleep behavior.

## Machine learning applications in time series hierarchical forecasting

Dec 01, 2019
Mahdi Abolghasemi, Rob J Hyndman, Garth Tarr, Christoph Bergmeir

Hierarchical forecasting (HF) is needed in many situations in the supply chain (SC) because managers often need different levels of forecasts at different levels of SC to make a decision. Top-Down (TD), Bottom-Up (BU) and Optimal Combination (COM) are common HF models. These approaches are static and often ignore the dynamics of the series while disaggregating them. Consequently, they may fail to perform well if the investigated group of time series are subject to large changes such as during the periods of promotional sales. We address the HF problem of predicting real-world sales time series that are highly impacted by promotion. We use three machine learning (ML) models to capture sales variations over time. Artificial neural networks (ANN), extreme gradient boosting (XGboost), and support vector regression (SVR) algorithms are used to estimate the proportions of lower-level time series from the upper level. We perform an in-depth analysis of 61 groups of time series with different volatilities and show that ML models are competitive and outperform some well-established models in the literature.

## Bag of Recurrence Patterns Representation for Time-Series Classification

Mar 29, 2018
Nima Hatami, Yann Gavet, Johan Debayle

Time-Series Classification (TSC) has attracted a lot of attention in pattern recognition, because wide range of applications from different domains such as finance and health informatics deal with time-series signals. Bag of Features (BoF) model has achieved a great success in TSC task by summarizing signals according to the frequencies of "feature words" of a data-learned dictionary. This paper proposes embedding the Recurrence Plots (RP), a visualization technique for analysis of dynamic systems, in the BoF model for TSC. While the traditional BoF approach extracts features from 1D signal segments, this paper uses the RP to transform time-series into 2D texture images and then applies the BoF on them. Image representation of time-series enables us to explore different visual descriptors that are not available for 1D signals and to treats TSC task as a texture recognition problem. Experimental results on the UCI time-series classification archive demonstrates a significant accuracy boost by the proposed Bag of Recurrence patterns (BoR), compared not only to the existing BoF models, but also to the state-of-the art algorithms.

* Pattern Analysis and Applications Journal, 2018

## Equivalence relations and \$L^p\$ distances between time series

Feb 07, 2020
Nick James, Max Menzies

We introduce a general framework for defining equivalence and measuring distances between time series, and a first concrete method for doing so. We prove the existence of equivalence relations on the space of time series, such that the quotient spaces can be equipped with a metrizable topology. We illustrate algorithmically how to calculate such distances among a collection of time series, and perform clustering analysis based on these distances. We apply these insights to analyse the recent bushfires in NSW, Australia. There, we introduce a new method to analyse time series in a cross-contextual setting.

* Equal contribution

## Anomaly Detection in Time Series with Triadic Motif Fields and Application in Atrial Fibrillation ECG Classification

Dec 09, 2020

In the time-series analysis, the time series motifs and the order patterns in time series can reveal general temporal patterns and dynamic features. Triadic Motif Field (TMF) is a simple and effective time-series image encoding method based on triadic time series motifs. Electrocardiography (ECG) signals are time-series data widely used to diagnose various cardiac anomalies. The TMF images contain the features characterizing the normal and Atrial Fibrillation (AF) ECG signals. Considering the quasi-periodic characteristics of ECG signals, the dynamic features can be extracted from the TMF images with the transfer learning pre-trained convolutional neural network (CNN) models. With the extracted features, the simple classifiers, such as the Multi-Layer Perceptron (MLP), the logistic regression, and the random forest, can be applied for accurate anomaly detection. With the test dataset of the PhysioNet Challenge 2017 database, the TMF classification model with the VGG16 transfer learning model and MLP classifier demonstrates the best performance with the 95.50% ROC-AUC and 88.43% F1 score in the AF classification. Besides, the TMF classification model can identify AF patients in the test dataset with high precision. The feature vectors extracted from the TMF images show clear patient-wise clustering with the t-distributed Stochastic Neighbor Embedding technique. Above all, the TMF classification model has very good clinical interpretability. The patterns revealed by symmetrized Gradient-weighted Class Activation Mapping have a clear clinical interpretation at the beat and rhythm levels.

## Path Signature Area-Based Causal Discovery in Coupled Time Series

Oct 23, 2021

Coupled dynamical systems are frequently observed in nature, but often not well understood in terms of their causal structure without additional domain knowledge about the system. Especially when analyzing observational time series data of dynamical systems where it is not possible to conduct controlled experiments, for example time series of climate variables, it can be challenging to determine how features causally influence each other. There are many techniques available to recover causal relationships from data, such as Granger causality, convergent cross mapping, and causal graph structure learning approaches such as PCMCI. Path signatures and their associated signed areas provide a new way to approach the analysis of causally linked dynamical systems, particularly in informing a model-free, data-driven approach to algorithmic causal discovery. With this paper, we explore the use of path signatures in causal discovery and propose the application of confidence sequences to analyze the significance of the magnitude of the signed area between two variables. These confidence sequence regions converge with greater sampling length, and in conjunction with analyzing pairwise signed areas across time-shifted versions of the time series, can help identify the presence of lag/lead causal relationships. This approach provides a new way to define the confidence of a causal link existing between two time series, and ultimately may provide a framework for hypothesis testing to define whether one time series causes another

## Benchmarking time series classification -- Functional data vs machine learning approaches

Time series classification problems have drawn increasing attention in the machine learning and statistical community. Closely related is the field of functional data analysis (FDA): it refers to the range of problems that deal with the analysis of data that is continuously indexed over some domain. While often employing different methods, both fields strive to answer similar questions, a common example being classification or regression problems with functional covariates. We study methods from functional data analysis, such as functional generalized additive models, as well as functionality to concatenate (functional-) feature extraction or basis representations with traditional machine learning algorithms like support vector machines or classification trees. In order to assess the methods and implementations, we run a benchmark on a wide variety of representative (time series) data sets, with in-depth analysis of empirical results, and strive to provide a reference ranking for which method(s) to use for non-expert practitioners. Additionally, we provide a software framework in R for functional data analysis for supervised learning, including machine learning and more linear approaches from statistics. This allows convenient access, and in connection with the machine-learning toolbox mlr, those methods can now also be tuned and benchmarked.

## A novel method of fuzzy time series forecasting based on interval index number and membership value using support vector machine

Oct 20, 2020
Kiran Bisht, Arun Kumar

Fuzzy time series forecasting methods are very popular among researchers for predicting future values as they are not based on the strict assumptions of traditional time series forecasting methods. Non-stochastic methods of fuzzy time series forecasting are preferred by the researchers as they provide more significant forecasting results. There are generally, four factors that determine the performance of the forecasting method (1) number of intervals (NOIs) and length of intervals to partition universe of discourse (UOD) (2) fuzzification rules or feature representation of crisp time series (3) method of establishing fuzzy logic rule (FLRs) between input and target values (4) defuzzification rule to get crisp forecasted value. Considering the first two factors to improve the forecasting accuracy, we proposed a novel non-stochastic method fuzzy time series forecasting in which interval index number and membership value are used as input features to predict future value. We suggested a simple rounding-off range and suitable step size method to find the optimal number of intervals (NOIs) and used fuzzy c-means clustering process to divide UOD into intervals of unequal length. We implement support vector machine (SVM) to establish FLRs. To test our proposed method we conduct a simulated study on five widely used real time series and compare the performance with some recently developed models. We also examine the performance of the proposed model by using multi-layer perceptron (MLP) instead of SVM. Two performance measures RSME and SMAPE are used for performance analysis and observed better forecasting accuracy by the proposed model.

<<
8
9
10
11
12
13
14
15
16
17
18
19
20
>>