The time series captured by a single scalp electrode (plus the reference electrode) of refractory epileptic patients is used to forecast seizures susceptibility. The time series is preprocessed, segmented, and each segment transformed into an image, using three different known methods: Recurrence Plot, Gramian Angular Field, Markov Transition Field. The likelihood of the occurrence of a seizure in a future predefined time window is computed by averaging the output of the softmax layer of a CNN, differently from the usual consideration of the output of the classification layer. By thresholding this likelihood, seizure forecasting has better performance. Interestingly, for almost every patient, the best threshold was different from 50%. The results show that this technique can predict with good results for some seizures and patients. However, more tests, namely more patients and more seizures, are needed to better understand the real potential of this technique.
The Dynamic Time Warping ("DTW") distance is widely used in time series analysis, be it for classification, clustering or similarity search. However, its quadratic time complexity prevents it from scaling. Strategies, based on early abandoning DTW or skipping its computation altogether thanks to lower bounds, have been developed for certain use cases such as nearest neighbour search. But vectorization and approximation aside, no advance was made on DTW itself until recently with the introduction of PrunedDTW. This algorithm, able to prune unpromising alignments, was later fitted with early abandoning. We present a new version of PrunedDTW, "EAPrunedDTW", designed with early abandon in mind from the start, and able to early abandon faster than before. We show that EAPrunedDTW significantly improves the computation time of similarity search in the UCR Suite, and renders lower bounds dispensable.
Financial markets are a source of non-stationary multidimensional time series which has been drawing attention for decades. Each financial instrument has its specific changing over time properties, making their analysis a complex task. Improvement of understanding and development of methods for financial time series analysis is essential for successful operation on financial markets. In this study we propose a volume-based data pre-processing method for making financial time series more suitable for machine learning pipelines. We use a statistical approach for assessing the performance of the method. Namely, we formally state the hypotheses, set up associated classification tasks, compute effect sizes with confidence intervals, and run statistical tests to validate the hypotheses. We additionally assess the trading performance of the proposed method on historical data and compare it to a previously published approach. Our analysis shows that the proposed volume-based method allows successful classification of the financial time series patterns, and also leads to better classification performance than a price action-based method, excelling specifically on more liquid financial instruments. Finally, we propose an approach for obtaining feature interactions directly from tree-based models on example of CatBoost estimator, as well as formally assess the relatedness of the proposed approach and SHAP feature interactions with a positive outcome.
In the time-series analysis, the time series motifs and the order patterns in time series can reveal general temporal patterns and dynamic features. Triadic Motif Field (TMF) is a simple and effective time-series image encoding method based on triadic time series motifs. Electrocardiography (ECG) signals are time-series data widely used to diagnose various cardiac anomalies. The TMF images contain the features characterizing the normal and Atrial Fibrillation (AF) ECG signals. Considering the quasi-periodic characteristics of ECG signals, the dynamic features can be extracted from the TMF images with the transfer learning pre-trained convolutional neural network (CNN) models. With the extracted features, the simple classifiers, such as the Multi-Layer Perceptron (MLP), the logistic regression, and the random forest, can be applied for accurate anomaly detection. With the test dataset of the PhysioNet Challenge 2017 database, the TMF classification model with the VGG16 transfer learning model and MLP classifier demonstrates the best performance with the 95.50% ROC-AUC and 88.43% F1 score in the AF classification. Besides, the TMF classification model can identify AF patients in the test dataset with high precision. The feature vectors extracted from the TMF images show clear patient-wise clustering with the t-distributed Stochastic Neighbor Embedding technique. Above all, the TMF classification model has very good clinical interpretability. The patterns revealed by symmetrized Gradient-weighted Class Activation Mapping have a clear clinical interpretation at the beat and rhythm levels.
Feature-based time series representation has attracted substantial attention in a wide range of time series analysis methods. Recently, the use of time series features for forecast model selection and model averaging has been an emerging research focus in the forecasting community. Nonetheless, most of the existing approaches depend on the manual choice of an appropriate set of features. Exploiting machine learning methods to automatically extract features from time series becomes crucially important in the state-of-the-art time series analysis. In this paper, we introduce an automated approach to extract time series features based on images. Time series are first transformed into recurrence images, from which local features can be extracted using computer vision algorithms. The extracted features are used for forecast model selection and model averaging. Our experiments show that forecasting based on automatically extracted features, with less human intervention and a more comprehensive view of the raw time series data, yields comparable performances with the top best methods proposed in the largest forecasting competition M4.
Lung cancer is the leading cause of cancer-related mortality worldwide. Lung cancer screening (LCS) using annual low-dose computed tomography (CT) scanning has been proven to significantly reduce lung cancer mortality by detecting cancerous lung nodules at an earlier stage. Improving risk stratification of malignancy risk in lung nodules can be enhanced using machine/deep learning algorithms. However most existing algorithms: a) have primarily assessed single time-point CT data alone thereby failing to utilize the inherent advantages contained within longitudinal imaging datasets; b) have not integrated into computer models pertinent clinical data that might inform risk prediction; c) have not assessed algorithm performance on the spectrum of nodules that are most challenging for radiologists to interpret and where assistance from analytic tools would be most beneficial. Here we show the performance of our time-series deep learning model (DeepCAD-NLM-L) which integrates multi-model information across three longitudinal data domains: nodule-specific, lung-specific, and clinical demographic data. We compared our time-series deep learning model to a) radiologist performance on CTs from the National Lung Screening Trial enriched with the most challenging nodules for diagnosis; b) a nodule management algorithm from a North London LCS study (SUMMIT). Our model demonstrated comparable and complementary performance to radiologists when interpreting challenging lung nodules and showed improved performance (AUC=88\%) against models utilizing single time-point data only. The results emphasise the importance of time-series, multi-modal analysis when interpreting malignancy risk in LCS.
The growing integration of distributed energy resources (DERs) in distribution grids raises various reliability issues due to DER's uncertain and complex behaviors. With a large-scale DER penetration in distribution grids, traditional outage detection methods, which rely on customers report and smart meters' last gasp signals, will have poor performance, because the renewable generators and storages and the mesh structure in urban distribution grids can continue supplying power after line outages. To address these challenges, we propose a data-driven outage monitoring approach based on the stochastic time series analysis with a theoretical guarantee. Specifically, we prove via power flow analysis that the dependency of time-series voltage measurements exhibits significant statistical changes after line outages. This makes the theory on optimal change-point detection suitable to identify line outages. However, existing change point detection methods require post-outage voltage distribution, which is unknown in distribution systems. Therefore, we design a maximum likelihood estimator to directly learn the distribution parameters from voltage data. We prove that the estimated parameters-based detection also achieves the optimal performance, making it extremely useful for fast distribution grid outage identifications. Furthermore, since smart meters have been widely installed in distribution grids and advanced infrastructure (e.g., PMU) has not widely been available, our approach only requires voltage magnitude for quick outage identification. Simulation results show highly accurate outage identification in eight distribution grids with 14 configurations with and without DERs using smart meter data.
Vanilla convolutional neural networks are known to provide superior performance not only in image recognition tasks but also in natural language processing and time series analysis. One of the strengths of convolutional layers is the ability to learn features about spatial relations in the input domain using various parameterized convolutional kernels. However, in time series analysis learning such spatial relations is not necessarily required nor effective. In such cases, kernels which model temporal dependencies or kernels with broader spatial resolutions are recommended for more efficient training as proposed by dilation kernels. However, the dilation has to be fixed a priori which limits the flexibility of the kernels. We propose generalized dilation networks which generalize the initial dilations in two aspects. First we derive an end-to-end learnable architecture for dilation layers where also the dilation rate can be learned. Second we break up the strict dilation structure, in that we develop kernels operating independently in the input space.
Multi-step prediction is considered of major significance for time series analysis in many real life problems. Existing methods mainly focus on one-step-ahead forecasting, since multiple step forecasting generally fails due to accumulation of prediction errors. This paper presents a novel approach for predicting plant growth in agriculture, focusing on prediction of plant Stem Diameter Variations (SDV). The proposed approach consists of three main steps. At first, wavelet decomposition is applied to the original data, as to facilitate model fitting and reduce noise in them. Then an encoder-decoder framework is developed using Long Short Term Memory (LSTM) and used for appropriate feature extraction from the data. Finally, a recurrent neural network including LSTM and an attention mechanism is proposed for modelling long-term dependencies in the time series data. Experimental results are presented which illustrate the good performance of the proposed approach and that it significantly outperforms the existing models, in terms of error criteria such as RMSE, MAE and MAPE.
The art of systematic financial trading evolved with an array of approaches, ranging from simple strategies to complex algorithms all relying, primary, on aspects of time-series analysis. Recently, after visiting the trading floor of a leading financial institution, we noticed that traders always execute their trade orders while observing images of financial time-series on their screens. In this work, we built upon the success in image recognition and examine the value in transforming the traditional time-series analysis to that of image classification. We create a large sample of financial time-series images encoded as candlestick (Box and Whisker) charts and label the samples following three algebraically-defined binary trade strategies. Using the images, we train over a dozen machine-learning classification models and find that the algorithms are very efficient in recovering the complicated, multiscale label-generating rules when the data is represented visually. We suggest that the transformation of continuous numeric time-series classification problem to a vision problem is useful for recovering signals typical of technical analysis.