Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Time Series Analysis": models, code, and papers

Cats & Co: Categorical Time Series Coclustering

May 06, 2015
Dominique Gay, Romain Guigourès, Marc Boullé, Fabrice Clérot

We suggest a novel method of clustering and exploratory analysis of temporal event sequences data (also known as categorical time series) based on three-dimensional data grid models. A data set of temporal event sequences can be represented as a data set of three-dimensional points, each point is defined by three variables: a sequence identifier, a time value and an event value. Instantiating data grid models to the 3D-points turns the problem into 3D-coclustering. The sequences are partitioned into clusters, the time variable is discretized into intervals and the events are partitioned into clusters. The cross-product of the univariate partitions forms a multivariate partition of the representation space, i.e., a grid of cells and it also represents a nonparametric estimator of the joint distribution of the sequences, time and events dimensions. Thus, the sequences are grouped together because they have similar joint distribution of time and events, i.e., similar distribution of events along the time dimension. The best data grid is computed using a parameter-free Bayesian model selection approach. We also suggest several criteria for exploiting the resulting grid through agglomerative hierarchies, for interpreting the clusters of sequences and characterizing their components through insightful visualizations. Extensive experiments on both synthetic and real-world data sets demonstrate that data grid models are efficient, effective and discover meaningful underlying patterns of categorical time series data.


Sintel: A Machine Learning Framework to Extract Insights from Signals

Apr 19, 2022
Sarah Alnegheimish, Dongyu Liu, Carles Sala, Laure Berti-Equille, Kalyan Veeramachaneni

The detection of anomalies in time series data is a critical task with many monitoring applications. Existing systems often fail to encompass an end-to-end detection process, to facilitate comparative analysis of various anomaly detection methods, or to incorporate human knowledge to refine output. This precludes current methods from being used in real-world settings by practitioners who are not ML experts. In this paper, we introduce Sintel, a machine learning framework for end-to-end time series tasks such as anomaly detection. The framework uses state-of-the-art approaches to support all steps of the anomaly detection process. Sintel logs the entire anomaly detection journey, providing detailed documentation of anomalies over time. It enables users to analyze signals, compare methods, and investigate anomalies through an interactive visualization tool, where they can annotate, modify, create, and remove events. Using these annotations, the framework leverages human knowledge to improve the anomaly detection pipeline. We demonstrate the usability, efficiency, and effectiveness of Sintel through a series of experiments on three public time series datasets, as well as one real-world use case involving spacecraft experts tasked with anomaly analysis tasks. Sintel's framework, code, and datasets are open-sourced at

* This work is accepted by ACM SIGMOD/PODS International Conference on Management of Data (SIGMOD 2022) 

Local Score Dependent Model Explanation for Time Dependent Covariates

Aug 13, 2019
Xochitl Watts, Freddy Lecue

The use of deep neural networks to make high risk decisions creates a need for global and local explanations so that users and experts have confidence in the modeling algorithms. We introduce a novel technique to find global and local explanations for time series data used in binary classification machine learning systems. We identify the most salient of the original features used by a black box model to distinguish between classes. The explanation can be made on categorical, continuous, and time series data and can be generalized to any binary classification model. The analysis is conducted on time series data to train a long short-term memory deep neural network and uses the time dependent structure of the underlying features in the explanation. The proposed technique attributes weights to features to explain an observations risk of belonging to a class as a multiplicative factor of a base hazard rate. We use a variation of the Cox Proportional Hazards regression, a Generalized Additive Model, to explain the effect of variables upon the probability of an in-class response for a score output from the black box model. The covariates incorporate time dependence structure in the features so the explanation is inclusive of the underlying time series data structure.

* Work accepted as full paper for presentation at XAI (Explainable AI) workshop at Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI) 2019 in Macao, China - August 10-16, 2019 

Imaging Time-Series to Improve Classification and Imputation

Jun 01, 2015
Zhiguang Wang, Tim Oates

Inspired by recent successes of deep learning in computer vision, we propose a novel framework for encoding time series as different types of images, namely, Gramian Angular Summation/Difference Fields (GASF/GADF) and Markov Transition Fields (MTF). This enables the use of techniques from computer vision for time series classification and imputation. We used Tiled Convolutional Neural Networks (tiled CNNs) on 20 standard datasets to learn high-level features from the individual and compound GASF-GADF-MTF images. Our approaches achieve highly competitive results when compared to nine of the current best time series classification approaches. Inspired by the bijection property of GASF on 0/1 rescaled data, we train Denoised Auto-encoders (DA) on the GASF images of four standard and one synthesized compound dataset. The imputation MSE on test data is reduced by 12.18%-48.02% when compared to using the raw data. An analysis of the features and weights learned via tiled CNNs and DAs explains why the approaches work.

* Accepted by IJCAI-2015 ML track 

Innovations Autoencoder and its Application in Real-Time Anomaly Detection

Jun 23, 2021
Xinyi Wang, Lang Tong

An innovations sequence of a time series is a sequence of independent and identically distributed random variables with which the original time series has a causal representation. The innovation at a time is statistically independent of the prior history of the time series. As such, it represents the new information contained at present but not in the past. Because of its simple probability structure, an innovations sequence is the most efficient signature of the original. Unlike the principle or independent analysis (PCA/ICA) representations, an innovations sequence preserves not only the complete statistical properties but also the temporal order of the original time series. An long-standing open problem is to find a computationally tractable way to extract an innovations sequence of non-Gaussian processes. This paper presents a deep learning approach, referred to as Innovations Autoencoder (IAE), that extracts innovations sequences using a causal convolutional neural network. An application of IAE to nonparametric anomaly detection with unknown anomaly and anomaly-free models is also presented.


Deep Neural Network Ensembles for Time Series Classification

Mar 15, 2019
H. Ismail Fawaz, G. Forestier, J. Weber, L. Idoumghar, P. Muller

Deep neural networks have revolutionized many fields such as computer vision and natural language processing. Inspired by this recent success, deep learning started to show promising results for Time Series Classification (TSC). However, neural networks are still behind the state-of-the-art TSC algorithms, that are currently composed of ensembles of 37 non deep learning based classifiers. We attribute this gap in performance due to the lack of neural network ensembles for TSC. Therefore in this paper, we show how an ensemble of 60 deep learning models can significantly improve upon the current state-of-the-art performance of neural networks for TSC, when evaluated over the UCR/UEA archive: the largest publicly available benchmark for time series analysis. Finally, we show how our proposed Neural Network Ensemble (NNE) is the first time series classifier to outperform COTE while reaching similar performance to the current state-of-the-art ensemble HIVE-COTE.

* Accepted at IJCNN 2019 

Interpretable Time Series Clustering Using Local Explanations

Aug 01, 2022
Ozan Ozyegen, Nicholas Prayogo, Mucahit Cevik, Ayse Basar

This study focuses on exploring the use of local interpretability methods for explaining time series clustering models. Many of the state-of-the-art clustering models are not directly explainable. To provide explanations for these clustering algorithms, we train classification models to estimate the cluster labels. Then, we use interpretability methods to explain the decisions of the classification models. The explanations are used to obtain insights into the clustering models. We perform a detailed numerical study to test the proposed approach on multiple datasets, clustering models, and classification models. The analysis of the results shows that the proposed approach can be used to explain time series clustering models, specifically when the underlying classification model is accurate. Lastly, we provide a detailed analysis of the results, discussing how our approach can be used in a real-life scenario.


Evaluation of Local Explanation Methods for Multivariate Time Series Forecasting

Sep 18, 2020
Ozan Ozyegen, Igor Ilic, Mucahit Cevik

Being able to interpret a machine learning model is a crucial task in many applications of machine learning. Specifically, local interpretability is important in determining why a model makes particular predictions. Despite the recent focus on AI interpretability, there has been a lack of research in local interpretability methods for time series forecasting while the few interpretable methods that exist mainly focus on time series classification tasks. In this study, we propose two novel evaluation metrics for time series forecasting: Area Over the Perturbation Curve for Regression and Ablation Percentage Threshold. These two metrics can measure the local fidelity of local explanation models. We extend the theoretical foundation to collect experimental results on two popular datasets, \textit{Rossmann sales} and \textit{electricity}. Both metrics enable a comprehensive comparison of numerous local explanation models and find which metrics are more sensitive. Lastly, we provide heuristical reasoning for this analysis.


LSTM Fully Convolutional Networks for Time Series Classification

Sep 08, 2017
Fazle Karim, Somshubra Majumdar, Houshang Darabi, Shun Chen

Fully convolutional neural networks (FCN) have been shown to achieve state-of-the-art performance on the task of classifying time series sequences. We propose the augmentation of fully convolutional networks with long short term memory recurrent neural network (LSTM RNN) sub-modules for time series classification. Our proposed models significantly enhance the performance of fully convolutional networks with a nominal increase in model size and require minimal preprocessing of the dataset. The proposed Long Short Term Memory Fully Convolutional Network (LSTM-FCN) achieves state-of-the-art performance compared to others. We also explore the usage of attention mechanism to improve time series classification with the Attention Long Short Term Memory Fully Convolutional Network (ALSTM-FCN). Utilization of the attention mechanism allows one to visualize the decision process of the LSTM cell. Furthermore, we propose fine-tuning as a method to enhance the performance of trained models. An overall analysis of the performance of our model is provided and compared to other techniques.

* 7 pages, 3 figures and 2 tables 

Automated data-driven approach for gap filling in the time series using evolutionary learning

Mar 01, 2021
Mikhail Sarafanov, Nikolay O. Nikitin, Anna V. Kalyuzhnaya

Time series analysis is widely used in various fields of science and industry. However, the vast majority of the time series obtained from real sources contain a large number of gaps, have a complex character, and can contain incorrect or missed parts. So, it is useful to have a convenient, efficient, and flexible instrument to fill the gaps in the time series. In this paper, we propose an approach for filling the gaps by the evolutionary automatic machine learning, that is implemented as a part of the FEDOT framework. Automated identification of the optimal data-driven model structure allows the adopting of the gap filling strategy to the specific problem. As a case study, the multivariate sea surface height dataset is used. During the experimental studies, the proposed approach was compared with other gap-filling methods and the composite models allow obtaining the higher quality of the gap restoration.