Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Time Series Analysis": models, code, and papers

Signal Processing on Graphs: Causal Modeling of Unstructured Data

Feb 08, 2017
Jonathan Mei, José M. F. Moura

Many applications collect a large number of time series, for example, the financial data of companies quoted in a stock exchange, the health care data of all patients that visit the emergency room of a hospital, or the temperature sequences continuously measured by weather stations across the US. These data are often referred to as unstructured. A first task in its analytics is to derive a low dimensional representation, a graph or discrete manifold, that describes well the interrelations among the time series and their intrarelations across time. This paper presents a computationally tractable algorithm for estimating this graph that structures the data. The resulting graph is directed and weighted, possibly capturing causal relations, not just reciprocal correlations as in many existing approaches in the literature. A convergence analysis is carried out. The algorithm is demonstrated on random graph datasets and real network time series datasets, and its performance is compared to that of related methods. The adjacency matrices estimated with the new method are close to the true graph in the simulated data and consistent with prior physical knowledge in the real dataset tested.

* IEEE Transactions on Signal Processing, vol. 65, no. 8, pp. 2077-2092, April 15, 2017 
Access Paper or Ask Questions

Applying Regression Conformal Prediction with Nearest Neighbors to time series data

Oct 25, 2021
Samya Tajmouati, Bouazza El Wahbi, Mohammed Dakkoun

In this paper, we apply conformal prediction to time series data. Conformal prediction isa method that produces predictive regions given a confidence level. The regions outputs arealways valid under the exchangeability assumption. However, this assumption does not holdfor the time series data because there is a link among past, current, and future observations.Consequently, the challenge of applying conformal predictors to the problem of time seriesdata lies in the fact that observations of a time series are dependent and therefore do notmeet the exchangeability assumption. This paper aims to present a way of constructingreliable prediction intervals by using conformal predictors in the context of time series. Weuse the nearest neighbors method based on the fast parameters tuning technique in theweighted nearest neighbors (FPTO-WNN) approach as the underlying algorithm. Dataanalysis demonstrates the effectiveness of the proposed approach.

Access Paper or Ask Questions

Real-time regression analysis with deep convolutional neural networks

May 07, 2018
E. A. Huerta, Daniel George, Zhizhen Zhao, Gabrielle Allen

We discuss the development of novel deep learning algorithms to enable real-time regression analysis for time series data. We showcase the application of this new method with a timely case study, and then discuss the applicability of this approach to tackle similar challenges across science domains.

* 3 pages. Position Paper accepted to SciML2018: DOE ASCR Workshop on Scientific Machine Learning. North Bethesda, MD, United States, January 30-February 1, 2018 
Access Paper or Ask Questions

Predictability, complexity and learning

Jan 23, 2001
William Bialek, Ilya Nemenman, Naftali Tishby

We define {\em predictive information} $I_{\rm pred} (T)$ as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times $T$: $I_{\rm pred} (T)$ can remain finite, grow logarithmically, or grow as a fractional power law. If the time series allows us to learn a model with a finite number of parameters, then $I_{\rm pred} (T)$ grows logarithmically with a coefficient that counts the dimensionality of the model space. In contrast, power--law growth is associated, for example, with the learning of infinite parameter (or nonparametric) models such as continuous functions with smoothness constraints. There are connections between the predictive information and measures of complexity that have been defined both in learning theory and in the analysis of physical systems through statistical mechanics and dynamical systems theory. Further, in the same way that entropy provides the unique measure of available information consistent with some simple and plausible conditions, we argue that the divergent part of $I_{\rm pred} (T)$ provides the unique measure for the complexity of dynamics underlying a time series. Finally, we discuss how these ideas may be useful in different problems in physics, statistics, and biology.

* Neural Computation 13, 2409-2463 (2001) 
* 53 pages, 3 figures, 98 references, LaTeX2e 
Access Paper or Ask Questions

COHORTNEY: Deep Clustering for Heterogeneous Event Sequences

Apr 03, 2021
Vladislav Zhuzhel, Rodrigo Rivera-Castro, Nina Kaploukhaya, Liliya Mironova, Alexey Zaytsev, Evgeny Burnaev

There is emerging attention towards working with event sequences. In particular, clustering of event sequences is widely applicable in domains such as healthcare, marketing, and finance. Use cases include analysis of visitors to websites, hospitals, or bank transactions. Unlike traditional time series, event sequences tend to be sparse and not equally spaced in time. As a result, they exhibit different properties, which are essential to account for when developing state-of-the-art methods. The community has paid little attention to the specifics of heterogeneous event sequences. Existing research in clustering primarily focuses on classic times series data. It is unclear if proposed methods in the literature generalize well to event sequences. Here we propose COHORTNEY as a novel deep learning method for clustering heterogeneous event sequences. Our contributions include (i) a novel method using a combination of LSTM and the EM algorithm and code implementation; (ii) a comparison of this method to previous research on time series and event sequence clustering; (iii) a performance benchmark of different approaches on a new dataset from the finance industry and fourteen additional datasets. Our results show that COHORTNEY vastly outperforms in speed and cluster quality the state-of-the-art algorithm for clustering event sequences.

* 19 pages 
Access Paper or Ask Questions

Synthesis of Realistic ECG using Generative Adversarial Networks

Sep 19, 2019
Anne Marie Delaney, Eoin Brophy, Tomas E. Ward

Access to medical data is highly restricted due to its sensitive nature, preventing communities from using this data for research or clinical training. Common methods of de-identification implemented to enable the sharing of data are sometimes inadequate to protect the individuals contained in the data. For our research, we investigate the ability of generative adversarial networks (GANs) to produce realistic medical time series data which can be used without concerns over privacy. The aim is to generate synthetic ECG signals representative of normal ECG waveforms. GANs have been used successfully to generate good quality synthetic time series and have been shown to prevent re-identification of individual records. In this work, a range of GAN architectures are developed to generate synthetic sine waves and synthetic ECG. Two evaluation metrics are then used to quantitatively assess how suitable the synthetic data is for real world applications such as clinical training and data analysis. Finally, we discuss the privacy concerns associated with sharing synthetic data produced by GANs and test their ability to withstand a simple membership inference attack. For the first time we both quantitatively and qualitatively demonstrate that GAN architecture can successfully generate time series signals that are not only structurally similar to the training sets but also diverse in nature across generated samples. We also report on their ability to withstand a simple membership inference attack, protecting the privacy of the training set.

Access Paper or Ask Questions

Quick, Stat!: A Statistical Analysis of the Quick, Draw! Dataset

Jul 15, 2019
Raul Fernandez-Fernandez, Juan G. Victores, David Estevez, Carlos Balaguer

The Quick, Draw! Dataset is a Google dataset with a collection of 50 million drawings, divided in 345 categories, collected from the users of the game Quick, Draw!. In contrast with most of the existing image datasets, in the Quick, Draw! Dataset, drawings are stored as time series of pencil positions instead of a bitmap matrix composed by pixels. This aspect makes this dataset the largest doodle dataset available at the time. The Quick, Draw! Dataset is presented as a great opportunity to researchers for developing and studying machine learning techniques. Due to the size of this dataset and the nature of its source, there is a scarce of information about the quality of the drawings contained. In this paper, a statistical analysis of three of the classes contained in the Quick, Draw! Dataset is depicted: mountain, book and whale. The goal is to give to the reader a first impression of the data collected in this dataset. For the analysis of the quality of the drawings, a Classification Neural Network was trained to obtain a classification score. Using this classification score and the parameters provided by the dataset, a statistical analysis of the quality and nature of the drawings contained in this dataset is provided.

* 12 pages, Eurosim 2019 
Access Paper or Ask Questions

Creating a level playing field for all symbols in a discretization

Oct 18, 2012
Matthew Butler, Dimitar Kazakov

In time series analysis research there is a strong interest in discrete representations of real valued data streams. One approach that emerged over a decade ago and is still considered state-of-the-art is the Symbolic Aggregate Approximation algorithm. This discretization algorithm was the first symbolic approach that mapped a real-valued time series to a symbolic representation that was guaranteed to lower-bound Euclidean distance. The interest of this paper concerns the SAX assumption of data being highly Gaussian and the use of the standard normal curve to choose partitions to discretize the data. Though not necessarily, but generally, and certainly in its canonical form, the SAX approach chooses partitions on the standard normal curve that would produce an equal probability for each symbol in a finite alphabet to occur. This procedure is generally valid as a time series is normalized before the rest of the SAX algorithm is applied. However there exists a caveat to this assumption of equi-probability due to the intermediate step of Piecewise Aggregate Approximation (PAA). What we will show in this paper is that when PAA is applied the distribution of the data is indeed altered, resulting in a shrinking standard deviation that is proportional to the number of points used to create a segment of the PAA representation and the degree of auto-correlation within the series. Data that exhibits statistically significant auto-correlation is less affected by this shrinking distribution. As the standard deviation of the data contracts, the mean remains the same, however the distribution is no longer standard normal and therefore the partitions based on the standard normal curve are no longer valid for the assumption of equal probability.

Access Paper or Ask Questions

Serial-EMD: Fast Empirical Mode Decomposition Method for Multi-dimensional Signals Based on Serialization

Jun 22, 2021
Jin Zhang, Fan Feng, Pere Marti-Puig, Cesar F. Caiafa, Zhe Sun, Feng Duan, Jordi Solé-Casals

Empirical mode decomposition (EMD) has developed into a prominent tool for adaptive, scale-based signal analysis in various fields like robotics, security and biomedical engineering. Since the dramatic increase in amount of data puts forward higher requirements for the capability of real-time signal analysis, it is difficult for existing EMD and its variants to trade off the growth of data dimension and the speed of signal analysis. In order to decompose multi-dimensional signals at a faster speed, we present a novel signal-serialization method (serial-EMD), which concatenates multi-variate or multi-dimensional signals into a one-dimensional signal and uses various one-dimensional EMD algorithms to decompose it. To verify the effects of the proposed method, synthetic multi-variate time series, artificial 2D images with various textures and real-world facial images are tested. Compared with existing multi-EMD algorithms, the decomposition time becomes significantly reduced. In addition, the results of facial recognition with Intrinsic Mode Functions (IMFs) extracted using our method can achieve a higher accuracy than those obtained by existing multi-EMD algorithms, which demonstrates the superior performance of our method in terms of the quality of IMFs. Furthermore, this method can provide a new perspective to optimize the existing EMD algorithms, that is, transforming the structure of the input signal rather than being constrained by developing envelope computation techniques or signal decomposition methods. In summary, the study suggests that the serial-EMD technique is a highly competitive and fast alternative for multi-dimensional signal analysis.

* 19 pages, 17 figures 
Access Paper or Ask Questions

An analysis of deep neural networks for predicting trends in time series data

Sep 22, 2020
Kouame Hermann Kouassi, Deshendran Moodley

Recently, a hybrid Deep Neural Network (DNN) algorithm, TreNet was proposed for predicting trends in time series data. While TreNet was shown to have superior performance for trend prediction to other DNN and traditional ML approaches, the validation method used did not take into account the sequential nature of time series data sets and did not deal with model update. In this research we replicated the TreNet experiments on the same data sets using a walk-forward validation method and tested our optimal model over multiple independent runs to evaluate model stability. We compared the performance of the hybrid TreNet algorithm, on four data sets to vanilla DNN algorithms that take in point data, and also to traditional ML algorithms. We found that in general TreNet still performs better than the vanilla DNN models, but not on all data sets as reported in the original TreNet study. This study highlights the importance of using an appropriate validation method and evaluating model stability for evaluating and developing machine learning models for trend prediction in time series data.

Access Paper or Ask Questions