Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Time Series Analysis": models, code, and papers

Empirical Analysis of Lifelog Data using Optimal Feature Selection based Unsupervised Logistic Regression (OFS-ULR) Model with Spark Streaming

Apr 12, 2022
Sadhana Tiwari, Sonali Agarwal

Recent advancement in the field of pervasive healthcare monitoring systems causes the generation of a huge amount of lifelog data in real-time. Chronic diseases are one of the most serious health challenges in developing and developed countries. According to WHO, this accounts for 73% of all deaths and 60% of the global burden of diseases. Chronic disease classification models are now harnessing the potential of lifelog data to explore better healthcare practices. This paper is to construct an optimal feature selection-based unsupervised logistic regression model (OFS-ULR) to classify chronic diseases. Since lifelog data analysis is crucial due to its sensitive nature; thus the conventional classification models show limited performance. Therefore, designing new classifiers for the classification of chronic diseases using lifelog data is the need of the age. The vital part of building a good model depends on pre-processing of the dataset, identifying important features, and then training a learning algorithm with suitable hyper parameters for better performance. The proposed approach improves the performance of existing methods using a series of steps such as (i) removing redundant or invalid instances, (ii) making the data labelled using clustering and partitioning the data into classes, (iii) identifying the suitable subset of features by applying either some domain knowledge or selection algorithm, (iv) hyper parameter tuning for models to get best results, and (v) performance evaluation using Spark streaming environment. For this purpose, two-time series datasets are used in the experiment to compute the accuracy, recall, precision, and f1-score. The experimental analysis proves the suitability of the proposed approach as compared to the conventional classifiers and our newly constructed model achieved highest accuracy and reduced training complexity among all among all.

* Data analytics and Machine learning in healthcare 

Prediction of adverse events in Afghanistan: regression analysis of time series data grouped not by geographic dependencies

Feb 27, 2020
Krzysztof Fiok, Waldemar Karwowski, Maciej Wilamowski

The aim of this study was to approach a difficult regression task on highly unbalanced data regarding active theater of war in Afghanistan. Our focus was set on predicting the negative events number without distinguishing precise nature of the events given historical data on investment and negative events per each of predefined 400 Afghanistan districts. In contrast with previous research on the matter, we propose an approach to analysis of time series data that benefits from non-conventional aggregation of these territorial entities. By carrying out initial exploratory data analysis we demonstrate that dividing data according to our proposal allows to identify strong trend and seasonal components in the selected target variable. Utilizing this approach we also tried to estimate which data regarding investments is most important for prediction performance. Based on our exploratory analysis and previous research we prepared 5 sets of independent variables that were fed to 3 machine learning regression models. The results expressed by mean absolute and mean square errors indicate that leveraging historical data regarding target variable allows for reasonable performance, however unfortunately other proposed independent variables does not seem to improve prediction quality.

* 10 pages, 4 figures, 5 tables 

T-WaveNet: Tree-Structured Wavelet Neural Network for Sensor-Based Time Series Analysis

Dec 10, 2020
Minhao Liu, Ailing Zeng, Qiuxia Lai, Qiang Xu

Sensor-based time series analysis is an essential task for applications such as activity recognition and brain-computer interface. Recently, features extracted with deep neural networks (DNNs) are shown to be more effective than conventional hand-crafted ones. However, most of these solutions rely solely on the network to extract application-specific information carried in the sensor data. Motivated by the fact that usually a small subset of the frequency components carries the primary information for sensor data, we propose a novel tree-structured wavelet neural network for sensor data analysis, namely \emph{T-WaveNet}. To be specific, with T-WaveNet, we first conduct a power spectrum analysis for the sensor data and decompose the input signal into various frequency subbands accordingly. Then, we construct a tree-structured network, and each node on the tree (corresponding to a frequency subband) is built with an invertible neural network (INN) based wavelet transform. By doing so, T-WaveNet provides more effective representation for sensor information than existing DNN-based techniques, and it achieves state-of-the-art performance on various sensor datasets, including UCI-HAR for activity recognition, OPPORTUNITY for gesture recognition, BCICIV2a for intention recognition, and NinaPro DB1 for muscular movement recognition.


Wavelet Selection and Employment for Side-Channel Disassembly

Jul 25, 2021
Random Gwinn, Mark A. Matties, Aviel D. Rubin

Side-channel analysis, originally used in cryptanalysis is growing in use cases, both offensive and defensive. Wavelet analysis is a commonly employed time-frequency analysis technique used across disciplines, with a variety of purposes, and has shown increasing prevalence within side-channel literature. This paper explores wavelet selection and analysis parameters for use in side-channel analysis, particularly power side-channel-based instruction disassembly and classification. Experiments are conducted on an ATmega328P microcontroller and a subset of the AVR instruction set. Classification performance is evaluated with a time-series convolutional neural network (CNN) at clock-cycle fidelity. This work demonstrates that wavelet selection and employment parameters have meaningful impact on analysis outcomes. Practitioners should make informed decisions and consider optimizing these factors similarly to machine learning architecture and hyperparameters. We conclude that the gaus1 wavelet with scales 1-21 and grayscale colormap provided the best balance of classification performance, time, and memory efficiency in our application.

* 9 pages, 8 figures, 4 tables 

Deep Learning for Real-time Gravitational Wave Detection and Parameter Estimation: Results with Advanced LIGO Data

Nov 08, 2017
Daniel George, E. A. Huerta

The recent Nobel-prize-winning detections of gravitational waves from merging black holes and the subsequent detection of the collision of two neutron stars in coincidence with electromagnetic observations have inaugurated a new era of multimessenger astrophysics. To enhance the scope of this emergent field of science, we pioneered the use of deep learning with convolutional neural networks, that take time-series inputs, for rapid detection and characterization of gravitational wave signals. This approach, Deep Filtering, was initially demonstrated using simulated LIGO noise. In this article, we present the extension of Deep Filtering using real data from LIGO, for both detection and parameter estimation of gravitational waves from binary black hole mergers using continuous data streams from multiple LIGO detectors. We demonstrate for the first time that machine learning can detect and estimate the true parameters of real events observed by LIGO. Our results show that Deep Filtering achieves similar sensitivities and lower errors compared to matched-filtering while being far more computationally efficient and more resilient to glitches, allowing real-time processing of weak time-series signals in non-stationary non-Gaussian noise with minimal resources, and also enables the detection of new classes of gravitational wave sources that may go unnoticed with existing detection algorithms. This unified framework for data analysis is ideally suited to enable coincident detection campaigns of gravitational waves and their multimessenger counterparts in real-time.

* Physics Letters B, 778 (2018) 64-70 
* 6 pages, 7 figures; First application of deep learning to real LIGO events; Includes direct comparison against matched-filtering 

Scheduling Planting Time Through Developing an Optimization Model and Analysis of Time Series Growing Degree Units

Jul 02, 2022
Javad Ansarifar, Faezeh Akhavizadegan, Lizhi Wang

Producing higher-quality crops within shortened breeding cycles ensures global food availability and security, but this improvement intensifies logistical and productivity challenges for seed industries in the year-round breeding process due to the storage limitations. In the 2021 Syngenta crop challenge in analytics, Syngenta raised the problem to design an optimization model for the planting time scheduling in the 2020 year-round breeding process so that there is a consistent harvest quantity each week. They released a dataset that contained 2569 seed populations with their planting windows, required growing degree units for harvesting, and their harvest quantities at two sites. To address this challenge, we developed a new framework that consists of a weather time series model and an optimization model to schedule the planting time. A deep recurrent neural network was designed to predict the weather into the future, and a Gaussian process model on top of the time-series model was developed to model the uncertainty of forecasted weather. The proposed optimization models also scheduled the seed population's planting time at the fewest number of weeks with a more consistent weekly harvest quantity. Using the proposed optimization models can decrease the required capacity by 69% at site 0 and up to 51% at site 1 compared to the original planting time.


Memory-free Online Change-point Detection: A Novel Neural Network Approach

Jul 08, 2022
Zahra Atashgahi, Decebal Constantin Mocanu, Raymond Veldhuis, Mykola Pechenizkiy

Change-point detection (CPD), which detects abrupt changes in the data distribution, is recognized as one of the most significant tasks in time series analysis. Despite the extensive literature on offline CPD, unsupervised online CPD still suffers from major challenges, including scalability, hyperparameter tuning, and learning constraints. To mitigate some of these challenges, in this paper, we propose a novel deep learning approach for unsupervised online CPD from multi-dimensional time series, named Adaptive LSTM-Autoencoder Change-Point Detection (ALACPD). ALACPD exploits an LSTM-autoencoder-based neural network to perform unsupervised online CPD. It continuously adapts to the incoming samples without keeping the previously received input, thus being memory-free. We perform an extensive evaluation on several real-world time series CPD benchmarks. We show that ALACPD, on average, ranks first among state-of-the-art CPD algorithms in terms of quality of the time series segmentation, and it is on par with the best performer in terms of the accuracy of the estimated change-points. The implementation of ALACPD is available online on Github\footnote{\url{}}.


Wavelet-based clustering for time-series trend detection

Nov 17, 2020
Vincent Talbo, Mehdi Haddab, Derek Aubert, Redha Moulla

In this paper, we introduce a method performing clustering of time-series on the basis of their trend (increasing, stagnating/decreasing, and seasonal behavior). The clustering is performed using $k$-means method on a selection of coefficients obtained by discrete wavelet transform, reducing drastically the dimensionality. The method is applied on an use case for the clustering of a 864 daily sales revenue time-series for 61 retail shops. The results are presented for different mother wavelets. The importance of each wavelet coefficient and its level is discussed thanks to a principal component analysis along with a reconstruction of the signal from the selected wavelet coefficients.

* 10 pages, 11 figures 

Efficiently Discovering Frequent Motifs in Large-scale Sensor Data

Jan 02, 2015
Puneet Agarwal, Gautam Shroff, Sarmimala Saikia, Zaigham Khan

While analyzing vehicular sensor data, we found that frequently occurring waveforms could serve as features for further analysis, such as rule mining, classification, and anomaly detection. The discovery of waveform patterns, also known as time-series motifs, has been studied extensively; however, available techniques for discovering frequently occurring time-series motifs were found lacking in either efficiency or quality: Standard subsequence clustering results in poor quality, to the extent that it has even been termed 'meaningless'. Variants of hierarchical clustering using techniques for efficient discovery of 'exact pair motifs' find high-quality frequent motifs, but at the cost of high computational complexity, making such techniques unusable for our voluminous vehicular sensor data. We show that good quality frequent motifs can be discovered using bounded spherical clustering of time-series subsequences, which we refer to as COIN clustering, with near linear complexity in time-series size. COIN clustering addresses many of the challenges that previously led to subsequence clustering being viewed as meaningless. We describe an end-to-end motif-discovery procedure using a sequence of pre and post-processing techniques that remove trivial-matches and shifted-motifs, which also plagued previous subsequence-clustering approaches. We demonstrate that our technique efficiently discovers frequent motifs in voluminous vehicular sensor data as well as in publicly available data sets.

* 13 pages, 8 figures, Technical Report 

National-scale electricity peak load forecasting: Traditional, machine learning, or hybrid model?

Jun 30, 2021
Juyong Lee, Youngsang Cho

As the volatility of electricity demand increases owing to climate change and electrification, the importance of accurate peak load forecasting is increasing. Traditional peak load forecasting has been conducted through time series-based models; however, recently, new models based on machine or deep learning are being introduced. This study performs a comparative analysis to determine the most accurate peak load-forecasting model for Korea, by comparing the performance of time series, machine learning, and hybrid models. Seasonal autoregressive integrated moving average with exogenous variables (SARIMAX) is used for the time series model. Artificial neural network (ANN), support vector regression (SVR), and long short-term memory (LSTM) are used for the machine learning models. SARIMAX-ANN, SARIMAX-SVR, and SARIMAX-LSTM are used for the hybrid models. The results indicate that the hybrid models exhibit significant improvement over the SARIMAX model. The LSTM-based models outperformed the others; the single and hybrid LSTM models did not exhibit a significant performance difference. In the case of Korea's highest peak load in 2019, the predictive power of the LSTM model proved to be greater than that of the SARIMAX-LSTM model. The LSTM, SARIMAX-SVR, and SARIMAX-LSTM models outperformed the current time series-based forecasting model used in Korea. Thus, Korea's peak load-forecasting performance can be improved by including machine learning or hybrid models.