Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Time Series Analysis": models, code, and papers

Variable-lag Granger Causality and Transfer Entropy for Time Series Analysis

Feb 01, 2020
Chainarong Amornbunchornvej, Elena Zheleva, Tanya Berger-Wolf

Granger causality is a fundamental technique for causal inference in time series data, commonly used in the social and biological sciences. Typical operationalizations of Granger causality make a strong assumption that every time point of the effect time series is influenced by a combination of other time series with a fixed time delay. The assumption of fixed time delay also exists in Transfer Entropy, which is considered to be a non-linear version of Granger causality. However, the assumption of the fixed time delay does not hold in many applications, such as collective behavior, financial markets, and many natural phenomena. To address this issue, we develop variable-lag Granger causality and Transfer Entropy, generalizations of both Granger causality and Transfer Entropy that relax the assumption of the fixed time delay and allows causes to influence effects with arbitrary time delays. In addition, we propose a method for inferring both variable-lag Granger causality and Transfer Entropy relations. We demonstrate our approach on an application for studying coordinated collective behavior and other real-world casual-inference datasets and show that our proposed approaches perform better than several existing methods in both simulated and real-world datasets. Our approach can be applied in any domain of time series analysis. The software of this work is available in the R package: VLTimeSeriesCausality.

* This preprint is the extension of the work [arXiv:1912.10829] entitled "Variable-lag Granger Causality for Time Series Analysis" by the same authors. The R package is available at 

Expressway visibility estimation based on image entropy and piecewise stationary time series analysis

Apr 08, 2018
Xiaogang Cheng, Guoqing Liu, Anders Hedman, Kun Wang, Haibo Li

Vision-based methods for visibility estimation can play a critical role in reducing traffic accidents caused by fog and haze. To overcome the disadvantages of current visibility estimation methods, we present a novel data-driven approach based on Gaussian image entropy and piecewise stationary time series analysis (SPEV). This is the first time that Gaussian image entropy is used for estimating atmospheric visibility. To lessen the impact of landscape and sunshine illuminance on visibility estimation, we used region of interest (ROI) analysis and took into account relative ratios of image entropy, to improve estimation accuracy. We assume fog and haze cause blurred images and that fog and haze can be considered as a piecewise stationary signal. We used piecewise stationary time series analysis to construct the piecewise causal relationship between image entropy and visibility. To obtain a real-world visibility measure during fog and haze, a subjective assessment was established through a study with 36 subjects who performed visibility observations. Finally, a total of two million videos were used for training the SPEV model and validate its effectiveness. The videos were collected from the constantly foggy and hazy Tongqi expressway in Jiangsu, China. The contrast model of visibility estimation was used for algorithm performance comparison, and the validation results of the SPEV model were encouraging as 99.14% of the relative errors were less than 10%.


Compact representation of temporal processes in echosounder time series via matrix decomposition

Jul 06, 2020
Wu-Jung Lee, Valentina Staneva

Echosounders are high-frequency sonar systems widely used to observe mid-trophic level animals in the ocean. The recent deluge of echosounder data from diverse ocean observing platforms has created unprecedented opportunities to study the marine ecosystems at broad scales. However, there is a critical lack of methods capable of automatic and adaptive extraction of ecologically relevant spatio-temporal structures from echosounder observation, limiting effective and wider use of these rich datasets in marine ecological research. Here we present a data-driven methodology based on matrix decomposition that builds a compact representation of long-term echosounder time series using intrinsic features in the data, and demonstrate its utility by analyzing an example multi-frequency dataset from the northeast Pacific Ocean. We show that Principal Component Pursuit (PCP) successfully removes noise interference from the data, and that a temporally smooth Nonnegative Matrix Factorization (tsNMF) automatically discovers a small number of distinct daily echogram patterns, whose time-varying linear combination (activation) reconstructs the dominant structures in the original time series. This low-rank representation is more tractable and interpretable than the original time series. It is also suitable for visualization and systematic analysis with other ocean variables such as currents. Unlike existing echo analysis methods that rely on fixed, handcrafted rules, the data-driven and thus adaptable nature of our methodology is well-suited for analyzing data collected from unfamiliar ecosystems or ecosystems undergoing rapid changes in the changing climate. Future developments and applications based on this work will catalyze advancements in marine ecology by providing robust time series analytics for large-scale, acoustics-based biological observation in the ocean.


COBRAS-TS: A new approach to Semi-Supervised Clustering of Time Series

May 02, 2018
Toon Van Craenendonck, Wannes Meert, Sebastijan Dumancic, Hendrik Blockeel

Clustering is ubiquitous in data analysis, including analysis of time series. It is inherently subjective: different users may prefer different clusterings for a particular dataset. Semi-supervised clustering addresses this by allowing the user to provide examples of instances that should (not) be in the same cluster. This paper studies semi-supervised clustering in the context of time series. We show that COBRAS, a state-of-the-art semi-supervised clustering method, can be adapted to this setting. We refer to this approach as COBRAS-TS. An extensive experimental evaluation supports the following claims: (1) COBRAS-TS far outperforms the current state of the art in semi-supervised clustering for time series, and thus presents a new baseline for the field; (2) COBRAS-TS can identify clusters with separated components; (3) COBRAS-TS can identify clusters that are characterized by small local patterns; (4) a small amount of semi-supervision can greatly improve clustering quality for time series; (5) the choice of the clustering algorithm matters (contrary to earlier claims in the literature).


TSViz: Demystification of Deep Learning Models for Time-Series Analysis

Feb 08, 2018
Shoaib Ahmed Siddiqui, Dominik Mercier, Mohsin Munir, Andreas Dengel, Sheraz Ahmed

This paper presents a novel framework for demystification of convolutional deep learning models for time series analysis. This is a step towards making informed/explainable decisions in the domain of time series, powered by deep learning. There have been numerous efforts to increase the interpretability of image-centric deep neural network models, where the learned features are more intuitive to visualize. Visualization in time-series is much more complicated as there is no direct interpretation of the filters and inputs as compared to image modality. In addition, little or no concentration has been devoted for the development of such tools in the domain of time-series in the past. The visualization engine of the presented framework provides possibilities to explore and analyze a network from different dimensions at four different levels of abstraction. This enables the user to uncover different aspects of the model which includes important filters, filter clusters, and input saliency maps. These representations allow to understand the network features so that the acceptability of deep networks for time-series data can be enhanced. This is extremely important in domains like finance, industry 4.0, self-driving cars, health-care, counter-terrorism etc., where reasons for reaching a particular prediction are equally important as the prediction itself. The framework \footnote{Framework download link:} can also aid in discovery of the filters which are contributing nothing to the final prediction, hence, can be pruned without any significant loss in performance.

* 7 Pages (6 + 1 for references), 7 figures 

Real-time Drift Detection on Time-series Data

Oct 12, 2021
Nandini Ramanan, Rasool Tahmasbi, Marjorie Sayer, Deokwoo Jung, Shalini Hemachandran, Claudionor Nunes Coelho Jr

Practical machine learning applications involving time series data, such as firewall log analysis to proactively detect anomalous behavior, are concerned with real time analysis of streaming data. Consequently, we need to update the ML models as the statistical characteristics of such data may shift frequently with time. One alternative explored in the literature is to retrain models with updated data whenever the models accuracy is observed to degrade. However, these methods rely on near real time availability of ground truth, which is rarely fulfilled. Further, in applications with seasonal data, temporal concept drift is confounded by seasonal variation. In this work, we propose an approach called Unsupervised Temporal Drift Detector or UTDD to flexibly account for seasonal variation, efficiently detect temporal concept drift in time series data in the absence of ground truth, and subsequently adapt our ML models to concept drift for better generalization.

* 5 pages, 5 figures 

catch22: CAnonical Time-series CHaracteristics

Jan 30, 2019
Carl H Lubba, Sarab S Sethi, Philip Knaute, Simon R Schultz, Ben D Fulcher, Nick S Jones

Capturing the dynamical properties of time series concisely as interpretable feature vectors can enable efficient clustering and classification for time-series applications across science and industry. Selecting an appropriate feature-based representation of time series for a given application can be achieved through systematic comparison across a comprehensive time-series feature library, such as those in the hctsa toolbox. However, this approach is computationally expensive and involves evaluating many similar features, limiting the widespread adoption of feature-based representations of time series for real-world applications. In this work, we introduce a method to infer small sets of time-series features that (i) exhibit strong classification performance across a given collection of time-series problems, and (ii) are minimally redundant. Applying our method to a set of 93 time-series classification datasets (containing over 147000 time series) and using a filtered version of the hctsa feature library (4791 features), we introduce a generically useful set of 22 CAnonical Time-series CHaracteristics, catch22. This dimensionality reduction, from 4791 to 22, is associated with an approximately 1000-fold reduction in computation time and near linear scaling with time-series length, despite an average reduction in classification accuracy of just 7%. catch22 captures a diverse and interpretable signature of time series in terms of their properties, including linear and non-linear autocorrelation, successive differences, value distributions and outliers, and fluctuation scaling properties. We provide an efficient implementation of catch22, accessible from many programming environments, that facilitates feature-based time-series analysis for scientific, industrial, financial and medical applications using a common language of interpretable time-series properties.


Novel techniques for improvement the NNetEn entropy calculation for short and noisy time series

Feb 25, 2022
Hanif Heidari, Andrei Velichko

Entropy is a fundamental concept of information theory. It is widely used in the analysis of analog and digital signals. Conventional entropy measures have drawbacks, such as sensitivity to the length and amplitude of time series and low robustness to external noise. Recently, the NNetEn entropy measure has been introduced to overcome these problems. The NNetEn entropy uses a modified version of the LogNNet neural network classification model. The algorithm contains a reservoir matrix with N = 19625 elements, which the given time series should fill. Many practical time series have less than 19625 elements. Against this background, this paper investigates different duplicating and stretching techniques for filling to overcome this difficulty. The most successful technique is identified for practical applications. The presence of external noise and bias are other important issues affecting the efficiency of entropy measures. In order to perform meaningful analysis, three time series with different dynamics (chaotic, periodic, and binary), with a variation of signal-to-noise ratio (SNR) and offsets, are considered. It is shown that the error in the calculation of the NNetEn entropy does not exceed 10% when the SNR exceeds 30 dB. This opens the possibility of measuring the NNetEn of experimental signals in the presence of noise of various nature, white noise, or 1/f noise, without the need for noise filtering.

* 17 pages, 19 figures, 2 tables