Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Time Series Analysis": models, code, and papers

Generating Reliable Process Event Streams and Time Series Data based on Neural Networks

Mar 09, 2021
Tobias Herbert, Juergen Mangler, Stefanie Rinderle-Ma

Domains such as manufacturing and medicine crave for continuous monitoring and analysis of their processes, especially in combination with time series as produced by sensors. Time series data can be exploited to, for example, explain and predict concept drifts during runtime. Generally, a certain data volume is required in order to produce meaningful analysis results. However, reliable data sets are often missing, for example, if event streams and times series data are collected separately, in case of a new process, or if it is too expensive to obtain a sufficient data volume. Additional challenges arise with preparing time series data from multiple event sources, variations in data collection frequency, and concept drift. This paper proposes the GENLOG approach to generate reliable event and time series data that follows the distribution of the underlying input data set. GENLOG employs data resampling and enables the user to select different parts of the log data to orchestrate the training of a recurrent neural network for stream generation. The generated data is sampled back to its original sample rate and is embedded into a template representing the log data format it originated from. Overall, GENLOG can boost small data sets and consequently the application of online process mining.


Pay Attention to Evolution: Time Series Forecasting with Deep Graph-Evolution Learning

Aug 28, 2020
Gabriel Spadon, Shenda Hong, Bruno Brandoli, Stan Matwin, Jose F. Rodrigues-Jr, Jimeng Sun

Time-series forecasting is one of the most active research topics in predictive analysis. A still open gap in that literature is that statistical and ensemble learning approaches systematically present lower predictive performance than deep learning methods as they generally disregard the data sequence aspect entangled with multivariate data represented in more than one time series. Conversely, this work presents a novel neural network architecture for time-series forecasting that combines the power of graph evolution with deep recurrent learning on distinct data distributions; we named our method Recurrent Graph Evolution Neural Network (ReGENN). The idea is to infer multiple multivariate relationships between co-occurring time-series by assuming that the temporal data depends not only on inner variables and intra-temporal relationships (i.e., observations from itself) but also on outer variables and inter-temporal relationships (i.e., observations from other-selves). An extensive set of experiments was conducted comparing ReGENN with dozens of ensemble methods and classical statistical ones, showing sound improvement of up to 64.87% over the competing algorithms. Furthermore, we present an analysis of the intermediate weights arising from ReGENN, showing that by looking at inter and intra-temporal relationships simultaneously, time-series forecasting is majorly improved if paying attention to how multiple multivariate data synchronously evolve.

* Submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence 

Variable-lag Granger Causality for Time Series Analysis

Dec 18, 2019
Chainarong Amornbunchornvej, Elena Zheleva, Tanya Y. Berger-Wolf

Granger causality is a fundamental technique for causal inference in time series data, commonly used in the social and biological sciences. Typical operationalizations of Granger causality make a strong assumption that every time point of the effect time series is influenced by a combination of other time series with a fixed time delay. However, the assumption of the fixed time delay does not hold in many applications, such as collective behavior, financial markets, and many natural phenomena. To address this issue, we develop variable-lag Granger causality, a generalization of Granger causality that relaxes the assumption of the fixed time delay and allows causes to influence effects with arbitrary time delays. In addition, we propose a method for inferring variable-lag Granger causality relations. We demonstrate our approach on an application for studying coordinated collective behavior and show that it performs better than several existing methods in both simulated and real-world datasets. Our approach can be applied in any domain of time series analysis.

* This paper will be appeared in the proceeding of 2019 IEEE International Conference on Data Science and Advanced Analytics (DSAA). The R package is available at 

Fast Randomized Model Generation for Shapelet-Based Time Series Classification

Sep 23, 2012
Daniel Gordon, Danny Hendler, Lior Rokach

Time series classification is a field which has drawn much attention over the past decade. A new approach for classification of time series uses classification trees based on shapelets. A shapelet is a subsequence extracted from one of the time series in the dataset. A disadvantage of this approach is the time required for building the shapelet-based classification tree. The search for the best shapelet requires examining all subsequences of all lengths from all time series in the training set. A key goal of this work was to find an evaluation order of the shapelets space which enables fast convergence to an accurate model. The comparative analysis we conducted clearly indicates that a random evaluation order yields the best results. Our empirical analysis of the distribution of high-quality shapelets within the shapelets space provides insights into why randomized shapelets sampling is superior to alternative evaluation orders. We present an algorithm for randomized model generation for shapelet-based classification that converges extremely quickly to a model with surprisingly high accuracy after evaluating only an exceedingly small fraction of the shapelets space.


Highly comparative fetal heart rate analysis

Dec 03, 2014
B. D. Fulcher, A. E. Georgieva, C. W. G. Redman, Nick S. Jones

A database of fetal heart rate (FHR) time series measured from 7221 patients during labor is analyzed with the aim of learning the types of features of these recordings that are informative of low cord pH. Our 'highly comparative' analysis involves extracting over 9000 time-series analysis features from each FHR time series, including measures of autocorrelation, entropy, distribution, and various model fits. This diverse collection of features was developed in previous work, and is publicly available. We describe five features that most accurately classify a balanced training set of 59 'low pH' and 59 'normal pH' FHR recordings. We then describe five of the features with the strongest linear correlation to cord pH across the full dataset of FHR time series. The features identified in this work may be used as part of a system for guiding intervention during labor in future. This work successfully demonstrates the utility of comparing across a large, interdisciplinary literature on time-series analysis to automatically contribute new scientific results for specific biomedical signal processing challenges.

* Fulcher, B. D., Georgieva, A., Redman, C. W., & Jones, N. S. (2012). Highly comparative fetal heart rate analysis (pp. 3135-3138). Presented at the 34th Annual International Conference of the IEEE EMBS, San Diego, CA, USA 
* 7 pages, 4 figures 

Machine Learning of Time Series Using Time-delay Embedding and Precision Annealing

Feb 12, 2019
Alexander J. A. Ty, Zheng Fang, Rivver A. Gonzales, Paul J. Rozdeba, Henry D. I. Abarbanel

Tasking machine learning to predict segments of a time series requires estimating the parameters of a ML model with input/output pairs from the time series. Using the equivalence between statistical data assimilation and supervised machine learning, we revisit this task. The training method for the machine utilizes a precision annealing approach to identifying the global minimum of the action (-log[P]). In this way we are able to identify the number of training pairs required to produce good generalizations (predictions) for the time series. We proceed from a scalar time series $s(t_n); t_n = t_0 + n \Delta t$ and using methods of nonlinear time series analysis show how to produce a $D_E > 1$ dimensional time delay embedding space in which the time series has no false neighbors as does the observed $s(t_n)$ time series. In that $D_E$-dimensional space we explore the use of feed forward multi-layer perceptrons as network models operating on $D_E$-dimensional input and producing $D_E$-dimensional outputs.


Extraction of instantaneous frequencies and amplitudes in nonstationary time-series data

Apr 03, 2021
Daniel E. Shea, Rajiv Giridharagopal, David S. Ginger, Steven L. Brunton, J. Nathan Kutz

Time-series analysis is critical for a diversity of applications in science and engineering. By leveraging the strengths of modern gradient descent algorithms, the Fourier transform, multi-resolution analysis, and Bayesian spectral analysis, we propose a data-driven approach to time-frequency analysis that circumvents many of the shortcomings of classic approaches, including the extraction of nonstationary signals with discontinuities in their behavior. The method introduced is equivalent to a {\em nonstationary Fourier mode decomposition} (NFMD) for nonstationary and nonlinear temporal signals, allowing for the accurate identification of instantaneous frequencies and their amplitudes. The method is demonstrated on a diversity of time-series data, including on data from cantilever-based electrostatic force microscopy to quantify the time-dependent evolution of charging dynamics at the nanoscale.


Time Series Featurization via Topological Data Analysis: an Application to Cryptocurrency Trend Forecasting

Dec 07, 2018
Kwangho Kim, Jisu Kim, Alessandro Rinaldo

We propose a novel methodology for feature extraction from time series data based on topological data analysis. The proposed procedure applies a dimensionality reduction technique via principal component analysis to the point cloud of the Takens' embedding from the observed time series and then evaluates the persistence landscape and silhouettes based on the corresponding Rips complex. We define a new notion of Rips distance function that is especially suited for persistence homologies built on Rips complexes and prove stability theorems for it. We use these results to demonstrate in turn some stability properties of the topological features extracted using our procedure with respect to additive noise and sampling. We further apply our method to the problem of trend forecasting for cryptocurrency prices, where we manage to achieve significantly lower error rates than more standard, non TDA-based methodologies in complex pattern classification tasks. We expect our method to provide a new insight on feature engineering for granular, noisy time series data.

* 36 pages 

Satellite Image Time Series Analysis for Big Earth Observation Data

Apr 24, 2022
Rolf Simoes, Gilberto Camara, Gilberto Queiroz, Felipe Souza, Pedro R. Andrade, Lorena Santos, Alexandre Carvalho, Karine Ferreira

The development of analytical software for big Earth observation data faces several challenges. Designers need to balance between conflicting factors. Solutions that are efficient for specific hardware architectures can not be used in other environments. Packages that work on generic hardware and open standards will not have the same performance as dedicated solutions. Software that assumes that its users are computer programmers are flexible but may be difficult to learn for a wide audience. This paper describes sits, an open-source R package for satellite image time series analysis using machine learning. To allow experts to use satellite imagery to the fullest extent, sits adopts a time-first, space-later approach. It supports the complete cycle of data analysis for land classification. Its API provides a simple but powerful set of functions. The software works in different cloud computing environments. Satellite image time series are input to machine learning classifiers, and the results are post-processed using spatial smoothing. Since machine learning methods need accurate training data, sits includes methods for quality assessment of training samples. The software also provides methods for validation and accuracy measurement. The package thus comprises a production environment for big EO data analysis. We show that this approach produces high accuracy for land use and land cover maps through a case study in the Cerrado biome, one of the world's fast moving agricultural frontiers for the year 2018.

* Remote Sensing, 13,2428, 2021 

GRATIS: GeneRAting TIme Series with diverse and controllable characteristics

Mar 07, 2019
Yanfei Kang, Rob J Hyndman, Feng Li

The explosion of time series data in recent years has brought a flourish of new time series analysis methods, for forecasting, clustering, classification and other tasks. The evaluation of these new methods requires a diverse collection of time series benchmarking data to enable reliable comparisons against alternative approaches. We propose GeneRAting TIme Series with diverse and controllable characteristics, named GRATIS, with the use of mixture autoregressive (MAR) models. We generate sets of time series using MAR models and investigate the diversity and coverage of the generated time series in a time series feature space. By tuning the parameters of the MAR models, GRATIS is also able to efficiently generate new time series with controllable features. In general, as a costless surrogate to the traditional data collection approach, GRATIS can be used as an evaluation tool for tasks such as time series forecasting and classification. We illustrate the usefulness of our time series generation process through a time series forecasting application.