Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Time Series Analysis": models, code, and papers

Uncovering life-course patterns with causal discovery and survival analysis

Jan 30, 2020
Bojan Kostic, Romain Crastes dit Sourd, Stephane Hess, Joachim Scheiner, Christian Holz-Rau, Francisco C. Pereira

We provide a novel approach and an exploratory study for modelling life event choices and occurrence from a probabilistic perspective through causal discovery and survival analysis. Our approach is formulated as a bi-level problem. In the upper level, we build the life events graph, using causal discovery tools. In the lower level, for the pairs of life events, time-to-event modelling through survival analysis is applied to model time-dependent transition probabilities. Several life events were analysed, such as getting married, buying a new car, child birth, home relocation and divorce, together with the socio-demographic attributes for survival modelling, some of which are age, nationality, number of children, number of cars and home ownership. The data originates from a survey conducted in Dortmund, Germany, with the questionnaire containing a series of retrospective questions about residential and employment biography, travel behaviour and holiday trips, as well as socio-economic characteristic. Although survival analysis has been used in the past to analyse life-course data, this is the first time that a bi-level model has been formulated. The inclusion of a causal discovery algorithm in the upper-level allows us to first identify causal relationships between life-course events and then understand the factors that might influence transition rates between events. This is very different from more classic choice models where causal relationships are subject to expert interpretations based on model results.

* 26 pages, 10 figures 

Sequential Quantiles via Hermite Series Density Estimation

Mar 04, 2017
Michael Stephanou, Melvin Varughese, Iain Macdonald

Sequential quantile estimation refers to incorporating observations into quantile estimates in an incremental fashion thus furnishing an online estimate of one or more quantiles at any given point in time. Sequential quantile estimation is also known as online quantile estimation. This area is relevant to the analysis of data streams and to the one-pass analysis of massive data sets. Applications include network traffic and latency analysis, real time fraud detection and high frequency trading. We introduce new techniques for online quantile estimation based on Hermite series estimators in the settings of static quantile estimation and dynamic quantile estimation. In the static quantile estimation setting we apply the existing Gauss-Hermite expansion in a novel manner. In particular, we exploit the fact that Gauss-Hermite coefficients can be updated in a sequential manner. To treat dynamic quantile estimation we introduce a novel expansion with an exponentially weighted estimator for the Gauss-Hermite coefficients which we term the Exponentially Weighted Gauss-Hermite (EWGH) expansion. These algorithms go beyond existing sequential quantile estimation algorithms in that they allow arbitrary quantiles (as opposed to pre-specified quantiles) to be estimated at any point in time. In doing so we provide a solution to online distribution function and online quantile function estimation on data streams. In particular we derive an analytical expression for the CDF and prove consistency results for the CDF under certain conditions. In addition we analyse the associated quantile estimator. Simulation studies and tests on real data reveal the Gauss-Hermite based algorithms to be competitive with a leading existing algorithm.

* Electron. J. Statist. 11 (2017), no. 1, 570--607 
* 43 pages, 9 figures. Improved version incorporating referee comments, as appears in Electronic Journal of Statistics 

SummerTime: Variable-length Time SeriesSummarization with Applications to PhysicalActivity Analysis

Feb 20, 2020
Kevin M. Amaral, Zihan Li, Wei Ding, Scott Crouter, Ping Chen

\textit{SummerTime} seeks to summarize globally time series signals and provides a fixed-length, robust summarization of the variable-length time series. Many classical machine learning methods for classification and regression depend on data instances with a fixed number of features. As a result, those methods cannot be directly applied to variable-length time series data. One common approach is to perform classification over a sliding window on the data and aggregate the decisions made at local sections of the time series in some way, through majority voting for classification or averaging for regression. The downside to this approach is that minority local information is lost in the voting process and averaging assumes that each time series measurement is equal in significance. Also, since time series can be of varying length, the quality of votes and averages could vary greatly in cases where there is a close voting tie or bimodal distribution of regression domain. Summarization conducted by the \textit{SummerTime} method will be a fixed-length feature vector which can be used in-place of the time series dataset for use with classical machine learning methods. We use Gaussian Mixture models (GMM) over small same-length disjoint windows in the time series to group local data into clusters. The time series' rate of membership for each cluster will be a feature in the summarization. The model is naturally capable of converging to an appropriate cluster count. We compare our results to state-of-the-art studies in physical activity classification and show high-quality improvement by classifying with only the summarization. Finally, we show that regression using the summarization can augment energy expenditure estimation, producing more robust and precise results.

* 11 pages, 2 figures, 5 tables 

Making Good on LSTMs Unfulfilled Promise

Nov 23, 2019
Daniel Philps, Artur d'Avila Garcez, Tillman Weyde

LSTMs promise much to financial time-series analysis, temporal and cross-sectional inference, but we find they do not deliver in a real-world financial management task. We examine an alternative called Continual Learning (CL), a memory-augmented approach, which can provide transparent explanations; which memory did what and when. This work has implications for many financial applications including to credit, time-varying fairness in decision making and more. We make three important new observations. Firstly, as well as being more explainable, time-series CL approaches outperform LSTM and a simple sliding window learner (feed-forward neural net (FFNN)). Secondly, we show that CL based on a sliding window learner (FFNN) is more effective than CL based on a sequential learner (LSTM). Thirdly, we examine how real-world, time-series noise impacts several similarity approaches used in CL memory addressing. We provide these insights using an approach called Continual Learning Augmentation (CLA) tested on a complex real world problem; emerging market equities investment decision making. CLA provides a test-bed as it can be based on different types of time-series learner, allowing testing of LSTM and sliding window (FFNN) learners side by side. CLA is also used to test several distance approaches used in a memory recall-gate: euclidean distance (ED), dynamic time warping (DTW), auto-encoder (AE) and a novel hybrid approach, warp-AE. We find CLA out-performs simple LSTM and FFNN learners and CLA based on a sliding window (CLA-FFNN) out-performs a LSTM (CLA-LSTM) implementation. While for memory-addressing, ED under-performs DTW and AE but warp-AE shows the best overall performance in a real-world financial task.

* 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada. arXiv admin note: text overlap with arXiv:1812.02340 

Empirical Quantitative Analysis of COVID-19 Forecasting Models

Oct 01, 2021
Yun Zhao, Yuqing Wang, Junfeng Liu, Haotian Xia, Zhenni Xu, Qinghang Hong, Zhiyang Zhou, Linda Petzold

COVID-19 has been a public health emergency of international concern since early 2020. Reliable forecasting is critical to diminish the impact of this disease. To date, a large number of different forecasting models have been proposed, mainly including statistical models, compartmental models, and deep learning models. However, due to various uncertain factors across different regions such as economics and government policy, no forecasting model appears to be the best for all scenarios. In this paper, we perform quantitative analysis of COVID-19 forecasting of confirmed cases and deaths across different regions in the United States with different forecasting horizons, and evaluate the relative impacts of the following three dimensions on the predictive performance (improvement and variation) through different evaluation metrics: model selection, hyperparameter tuning, and the length of time series required for training. We find that if a dimension brings about higher performance gains, if not well-tuned, it may also lead to harsher performance penalties. Furthermore, model selection is the dominant factor in determining the predictive performance. It is responsible for both the largest improvement and the largest variation in performance in all prediction tasks across different regions. While practitioners may perform more complicated time series analysis in practice, they should be able to achieve reasonable results if they have adequate insight into key decisions like model selection.

* ICDM workshop 2021 

Complex-valued Gaussian Process Regression for Time Series Analysis

Dec 07, 2017
Luca Ambrogioni, Eric Maris

The construction of synthetic complex-valued signals from real-valued observations is an important step in many time series analysis techniques. The most widely used approach is based on the Hilbert transform, which maps the real-valued signal into its quadrature component. In this paper, we define a probabilistic generalization of this approach. We model the observable real-valued signal as the real part of a latent complex-valued Gaussian process. In order to obtain the appropriate statistical relationship between its real and imaginary parts, we define two new classes of complex-valued covariance functions. Through an analysis of simulated chirplets and stochastic oscillations, we show that the resulting Gaussian process complex-valued signal provides a better estimate of the instantaneous amplitude and frequency than the established approaches. Furthermore, the complex-valued Gaussian process regression allows to incorporate prior information about the structure in signal and noise and thereby to tailor the analysis to the features of the signal. As a example, we analyze the non-stationary dynamics of brain oscillations in the alpha band, as measured using magneto-encephalography.


Deteção de estruturas permanentes a partir de dados de séries temporais Sentinel 1 e 2

Dec 11, 2019
André Neves, Carlos Damásio, João Pires, Fernando Birra

Mapping structures such as settlements, roads, individual houses and any other types of artificial structures is of great importance for the analysis of urban growth, masking, image alignment and, especially in the studied use case, the definition of Fuel Management Networks (FGC), which protect buildings from forest fires. Current cartography has a low generation frequency and their resolution may not be suitable for extracting small structures such as small settlements or roads, which may lack forest fire protection. In this paper, we use time series data, extracted from Sentinel-1 and 2 constellations, over Santar\'em, Ma\c{c}\~ao, to explore the detection of permanent structures at a resolution of 10 by 10 meters. For this purpose, a XGBoost classification model is trained with 133 attributes extracted from the time series from all the bands, including normalized radiometric indices. The results show that the use of time series data increases the accuracy of the extraction of permanent structures when compared using only static data, using multitemporal data also increases the number of detected roads. In general, the final result has a permanent structure mapping with a higher resolution than state of the art settlement maps, small structures and roads are also more accurately represented. Regarding the use case, by using our final map for the creation of FGC it is possible to simplify and accelerate the process of delimitation of the official FGC.

* 12 pages, in Portuguese, 7 figures, conference: INForum 2019 

COVID-19 Data Analysis and Forecasting: Algeria and the World

Aug 22, 2020
Sami Belkacem

The novel coronavirus disease 2019 COVID-19 has been leading the world into a prominent crisis. As of May 19, 2020, the virus had spread to 215 countries with more than 4,622,001 confirmed cases and 311,916 reported deaths worldwide, including Algeria with 7201 cases and 555 deaths. Analyze and forecast COVID-19 cases and deaths growth could be useful in many ways, governments could estimate medical equipment and take appropriate policy responses, and experts could approximate the peak and the end of the disease. In this work, we first train a time series Prophet model to analyze and forecast the number of COVID-19 cases and deaths in Algeria based on the previously reported numbers. Then, to better understand the spread and the properties of the COVID-19, we include external factors that may contribute to accelerate/slow the spread of the virus, construct a dataset from reliable sources, and conduct a large-scale data analysis considering 82 countries worldwide. The evaluation results show that the time series Prophet model accurately predicts the number of cases and deaths in Algeria with low RMSE scores of 218.87 and 4.79 respectively, while the forecast suggests that the total number of cases and deaths are expected to increase in the coming weeks. Moreover, the worldwide data-driven analysis reveals several correlations between the increase/decrease in the number of cases and deaths and external factors that may contribute to accelerate/slow the spread of the virus such as geographic, climatic, health, economic, and demographic factors.