Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Time Series Analysis": models, code, and papers

Detection of Anomalies in a Time Series Data using InfluxDB and Python

Dec 15, 2020
Tochukwu John Anih, Chika Amadi Bede, Chima Festus Umeokpala

Analysis of water and environmental data is an important aspect of many intelligent water and environmental system applications where inference from such analysis plays a significant role in decision making. Quite often these data that are collected through sensible sensors can be anomalous due to different reasons such as systems breakdown, malfunctioning of sensor detectors, and more. Regardless of their root causes, such data severely affect the results of the subsequent analysis. This paper demonstrates data cleaning and preparation for time-series data and further proposes cost-sensitive machine learning algorithms as a solution to detect anomalous data points in time-series data. The following models: Logistic Regression, Random Forest, Support Vector Machines have been modified to support the cost-sensitive learning which penalizes misclassified samples thereby minimizing the total misclassification cost. Our results showed that Random Forest outperformed the rest of the models at predicting the positive class (i.e anomalies). Applying predictive model improvement techniques like data oversampling seems to provide little or no improvement to the Random Forest model. Interestingly, with recursive feature elimination, we achieved a better model performance thereby reducing the dimensions in the data. Finally, with Influxdb and Kapacitor the data was ingested and streamed to generate new data points to further evaluate the model performance on unseen data, this will allow for early recognition of undesirable changes in the drinking water quality and will enable the water supply companies to rectify on a timely basis whatever undesirable changes abound.

* 12 pages, 9 figures, 4 tables 
  

EgPDE-Net: Building Continuous Neural Networks for Time Series Prediction with Exogenous Variables

Aug 03, 2022
Penglei Gao, Xi Yang, Kaizhu Huang, Rui Zhang, Ping Guo, John Y. Goulermas

While exogenous variables have a major impact on performance improvement in time series analysis, inter-series correlation and time dependence among them are rarely considered in the present continuous methods. The dynamical systems of multivariate time series could be modelled with complex unknown partial differential equations (PDEs) which play a prominent role in many disciplines of science and engineering. In this paper, we propose a continuous-time model for arbitrary-step prediction to learn an unknown PDE system in multivariate time series whose governing equations are parameterised by self-attention and gated recurrent neural networks. The proposed model, \underline{E}xogenous-\underline{g}uided \underline{P}artial \underline{D}ifferential \underline{E}quation Network (EgPDE-Net), takes account of the relationships among the exogenous variables and their effects on the target series. Importantly, the model can be reduced into a regularised ordinary differential equation (ODE) problem with special designed regularisation guidance, which makes the PDE problem tractable to obtain numerical solutions and feasible to predict multiple future values of the target series at arbitrary time points. Extensive experiments demonstrate that our proposed model could achieve competitive accuracy over strong baselines: on average, it outperforms the best baseline by reducing $9.85\%$ on RMSE and $13.98\%$ on MAE for arbitrary-step prediction.

  

Time Series Forecasting Using Manifold Learning

Oct 21, 2021
Panagiotis Papaioannou, Ronen Talmon, Daniela di Serafino, Ioannis Kevrekidis, Constantinos Siettos

We address a three-tier numerical framework based on manifold learning for the forecasting of high-dimensional time series. At the first step, we embed the time series into a reduced low-dimensional space using a nonlinear manifold learning algorithm such as Locally Linear Embedding and Diffusion Maps. At the second step, we construct reduced-order regression models on the manifold, in particular Multivariate Autoregressive (MVAR) and Gaussian Process Regression (GPR) models, to forecast the embedded dynamics. At the final step, we lift the embedded time series back to the original high-dimensional space using Radial Basis Functions interpolation and Geometric Harmonics. For our illustrations, we test the forecasting performance of the proposed numerical scheme with four sets of time series: three synthetic stochastic ones resembling EEG signals produced from linear and nonlinear stochastic models with different model orders, and one real-world data set containing daily time series of 10 key foreign exchange rates (FOREX) spanning the time period 03/09/2001-29/10/2020. The forecasting performance of the proposed numerical scheme is assessed using the combinations of manifold learning, modelling and lifting approaches. We also provide a comparison with the Principal Component Analysis algorithm as well as with the naive random walk model and the MVAR and GPR models trained and implemented directly in the high-dimensional space.

  

Interpretable Time-series Representation Learning With Multi-Level Disentanglement

May 17, 2021
Yuening Li, Zhengzhang Chen, Daochen Zha, Mengnan Du, Denghui Zhang, Haifeng Chen, Xia Hu

Time-series representation learning is a fundamental task for time-series analysis. While significant progress has been made to achieve accurate representations for downstream applications, the learned representations often lack interpretability and do not expose semantic meanings. Different from previous efforts on the entangled feature space, we aim to extract the semantic-rich temporal correlations in the latent interpretable factorized representation of the data. Motivated by the success of disentangled representation learning in computer vision, we study the possibility of learning semantic-rich time-series representations, which remains unexplored due to three main challenges: 1) sequential data structure introduces complex temporal correlations and makes the latent representations hard to interpret, 2) sequential models suffer from KL vanishing problem, and 3) interpretable semantic concepts for time-series often rely on multiple factors instead of individuals. To bridge the gap, we propose Disentangle Time Series (DTS), a novel disentanglement enhancement framework for sequential data. Specifically, to generate hierarchical semantic concepts as the interpretable and disentangled representation of time-series, DTS introduces multi-level disentanglement strategies by covering both individual latent factors and group semantic segments. We further theoretically show how to alleviate the KL vanishing problem: DTS introduces a mutual information maximization term, while preserving a heavier penalty on the total correlation and the dimension-wise KL to keep the disentanglement property. Experimental results on various real-world benchmark datasets demonstrate that the representations learned by DTS achieve superior performance in downstream applications, with high interpretability of semantic concepts.

  

A Non-linear Function-on-Function Model for Regression with Time Series Data

Nov 24, 2020
Qiyao Wang, Haiyan Wang, Chetan Gupta, Aniruddha Rajendra Rao, Hamed Khorasgani

In the last few decades, building regression models for non-scalar variables, including time series, text, image, and video, has attracted increasing interests of researchers from the data analytic community. In this paper, we focus on a multivariate time series regression problem. Specifically, we aim to learn mathematical mappings from multiple chronologically measured numerical variables within a certain time interval S to multiple numerical variables of interest over time interval T. Prior arts, including the multivariate regression model, the Seq2Seq model, and the functional linear models, suffer from several limitations. The first two types of models can only handle regularly observed time series. Besides, the conventional multivariate regression models tend to be biased and inefficient, as they are incapable of encoding the temporal dependencies among observations from the same time series. The sequential learning models explicitly use the same set of parameters along time, which has negative impacts on accuracy. The function-on-function linear model in functional data analysis (a branch of statistics) is insufficient to capture complex correlations among the considered time series and suffer from underfitting easily. In this paper, we propose a general functional mapping that embraces the function-on-function linear model as a special case. We then propose a non-linear function-on-function model using the fully connected neural network to learn the mapping from data, which addresses the aforementioned concerns in the existing approaches. For the proposed model, we describe in detail the corresponding numerical implementation procedures. The effectiveness of the proposed model is demonstrated through the application to two real-world problems.

* Accepted by IEEE Big Data 2020 
  

Correlated daily time series and forecasting in the M4 competition

Mar 31, 2020
Anti Ingel, Novin Shahroudi, Markus Kängsepp, Andre Tättar, Viacheslav Komisarenko, Meelis Kull

We participated in the M4 competition for time series forecasting and describe here our methods for forecasting daily time series. We used an ensemble of five statistical forecasting methods and a method that we refer to as the correlator. Our retrospective analysis using the ground truth values published by the M4 organisers after the competition demonstrates that the correlator was responsible for most of our gains over the naive constant forecasting method. We identify data leakage as one reason for its success, partly due to test data selected from different time intervals, and partly due to quality issues in the original time series. We suggest that future forecasting competitions should provide actual dates for the time series so that some of those leakages could be avoided by the participants.

* International Journal of Forecasting, 36(1), 121-128 (2020) 
  

Correlated daily time series and forecasting in M4 competition

Mar 28, 2020
Anti Ingel, Novin Shahroudi, Markus Kängsepp, Andre Tättar, Viacheslav Komisarenko, Meelis Kull

We participated in the M4 competition for time series forecasting and describe here our methods for forecasting daily time series. We used an ensemble of five statistical forecasting methods and a method that we refer to as the correlator. Our retrospective analysis using the ground truth values published by the M4 organisers after the competition demonstrates that the correlator was responsible for most of our gains over the naive constant forecasting method. We identify data leakage as one reason for its success, partly due to test data selected from different time intervals, and partly due to quality issues in the original time series. We suggest future forecasting competitions to provide actual dates for the time series so that some of those leakages could be avoided by the participants.

* International Journal of Forecasting, 36(1), 121-128 (2020) 
  

Time-Series Analysis via Low-Rank Matrix Factorization Applied to Infant-Sleep Data

Apr 10, 2019
Sheng Liu, Mark Cheng, Hayley Brooks, Wayne Mackey, David J. Heeger, Esteban G. Tabak, Carlos Fernandez-Granda

We propose a nonparametric model for time series with missing data based on low-rank matrix factorization. The model expresses each instance in a set of time series as a linear combination of a small number of shared basis functions. Constraining the functions and the corresponding coefficients to be nonnegative yields an interpretable low-dimensional representation of the data. A time-smoothing regularization term ensures that the model captures meaningful trends in the data, instead of overfitting short-term fluctuations. The low-dimensional representation makes it possible to detect outliers and cluster the time series according to the interpretable features extracted by the model, and also to perform forecasting via kernel regression. We apply our methodology to a large real-world dataset of infant-sleep data gathered by caregivers with a mobile-phone app. Our analysis automatically extracts daily-sleep patterns consistent with the existing literature. This allows us to compute sleep-development trends for the cohort, which characterize the emergence of circadian sleep and different napping habits. We apply our methodology to detect anomalous individuals, to cluster the cohort into groups with different sleeping tendencies, and to obtain improved predictions of future sleep behavior.

  

Machine learning applications in time series hierarchical forecasting

Dec 01, 2019
Mahdi Abolghasemi, Rob J Hyndman, Garth Tarr, Christoph Bergmeir

Hierarchical forecasting (HF) is needed in many situations in the supply chain (SC) because managers often need different levels of forecasts at different levels of SC to make a decision. Top-Down (TD), Bottom-Up (BU) and Optimal Combination (COM) are common HF models. These approaches are static and often ignore the dynamics of the series while disaggregating them. Consequently, they may fail to perform well if the investigated group of time series are subject to large changes such as during the periods of promotional sales. We address the HF problem of predicting real-world sales time series that are highly impacted by promotion. We use three machine learning (ML) models to capture sales variations over time. Artificial neural networks (ANN), extreme gradient boosting (XGboost), and support vector regression (SVR) algorithms are used to estimate the proportions of lower-level time series from the upper level. We perform an in-depth analysis of 61 groups of time series with different volatilities and show that ML models are competitive and outperform some well-established models in the literature.

  
<<
8
9
10
11
12
13
14
15
16
17
18
19
20
>>