Financial markets are a source of non-stationary multidimensional time series which has been drawing attention for decades. Each financial instrument has its specific changing over time properties, making their analysis a complex task. Improvement of understanding and development of methods for financial time series analysis is essential for successful operation on financial markets. In this study we propose a volume-based data pre-processing method for making financial time series more suitable for machine learning pipelines. We use a statistical approach for assessing the performance of the method. Namely, we formally state the hypotheses, set up associated classification tasks, compute effect sizes with confidence intervals, and run statistical tests to validate the hypotheses. We additionally assess the trading performance of the proposed method on historical data and compare it to a previously published approach. Our analysis shows that the proposed volume-based method allows successful classification of the financial time series patterns, and also leads to better classification performance than a price action-based method, excelling specifically on more liquid financial instruments. Finally, we propose an approach for obtaining feature interactions directly from tree-based models on example of CatBoost estimator, as well as formally assess the relatedness of the proposed approach and SHAP feature interactions with a positive outcome.
Stellar photospheric activity is known to limit the detection and characterisation of extra-solar planets. In particular, the study of Earth-like planets around Sun-like stars requires data analysis methods that can accurately model the stellar activity phenomena affecting radial velocity (RV) measurements. Gaussian Process Regression Networks (GPRNs) offer a principled approach to the analysis of simultaneous time-series, combining the structural properties of Bayesian neural networks with the non-parametric flexibility of Gaussian Processes. Using HARPS-N solar spectroscopic observations encompassing three years, we demonstrate that this framework is capable of jointly modelling RV data and traditional stellar activity indicators. Although we consider only the simplest GPRN configuration, we are able to describe the behaviour of solar RV data at least as accurately as previously published methods. We confirm the correlation between the RV and stellar activity time series reaches a maximum at separations of a few days, and find evidence of non-stationary behaviour in the time series, associated with an approaching solar activity minimum.
We develop an interpretable and learnable Wigner-Ville distribution that produces a super-resolved quadratic signal representation for time-series analysis. Our approach has two main hallmarks. First, it interpolates between known time-frequency representations (TFRs) in that it can reach super-resolution with increased time and frequency resolution beyond what the Heisenberg uncertainty principle prescribes and thus beyond commonly employed TFRs, Second, it is interpretable thanks to an explicit low-dimensional and physical parameterization of the Wigner-Ville distribution. We demonstrate that our approach is able to learn highly adapted TFRs and is ready and able to tackle various large-scale classification tasks, where we reach state-of-the-art performance compared to baseline and learned TFRs.
Time series analysis is quickly proceeding towards long and complex tasks. In recent years, fast approximate algorithms for discord search have been proposed in order to compensate for the increasing size of the time series. It is more interesting, however, to find quick exact solutions. In this research, we improved HOT SAX by exploiting two main ideas: the warm-up process, and the similarity between sequences close in time. The resulting algorithm, called HOT SAX Time (HST), has been validated with real and synthetic time series, and successfully compared with HOT SAX, RRA, SCAMP, and DADD. The complexity of a discord search has been evaluated with a new indicator, the cost per sequence (cps), which allows one to compare searches on time series of different lengths. Numerical evidence suggests that two conditions are involved in determining the complexity of a discord search in a non-trivial way: the length of the discords, and the noise/signal ratio. In the case of complex searches, HST can be more than 100 times faster than HOT SAX, thus being at the forefront of the exact discord search.
Mobile health apps are revolutionizing the healthcare ecosystem by improving communication, efficiency, and quality of service. In low- and middle-income countries, they also play a unique role as a source of information about health outcomes and behaviors of patients and healthcare workers, while providing a suitable channel to deliver both personalized and collective policy interventions. We propose a framework to study user engagement with mobile health, focusing on healthcare workers and digital health apps designed to support them in resource-poor settings. The behavioral logs produced by these apps can be transformed into daily time series characterizing each user's activity. We use probabilistic and survival analysis to build multiple personalized measures of meaningful engagement, which could serve to tailor content and digital interventions suiting each health worker's specific needs. Special attention is given to the problem of detecting churn, understood as a marker of complete disengagement. We discuss the application of our methods to the Indian and Ethiopian users of the Safe Delivery App, a capacity-building tool for skilled birth attendants. This work represents an important step towards a full characterization of user engagement in mobile health applications, which can significantly enhance the abilities of health workers and, ultimately, save lives.
Koopman Mode Decomposition (KMD) is a technique of nonlinear time-series analysis that originates from point spectrum of the Koopman operator defined for an underlying nonlinear dynamical system. We present a numerical algorithm of KMD based on Gaussian process regression that is capable of handling noisy finite-time data. The algorithm is applied to short-term swing dynamics of a multi-machine power grid in order to estimate oscillatory modes embedded in the dynamics, and thereby the effectiveness of the algorithm is evaluated.
Time series analysis plays a vital role in various applications, for instance, healthcare, weather prediction, disaster forecast, etc. However, to obtain sufficient shapelets by a feature network is still challenging. To this end, we propose a novel robust temporal feature network (RTFN) that contains temporal feature networks and attentional LSTM networks. The temporal feature networks are built to extract basic features from input data while the attentional LSTM networks are devised to capture complicated shapelets and relationships to enrich features. In experiments, we embed RTFN into supervised structure as a feature extraction network and into unsupervised clustering as an encoder, respectively. The results show that the RTFN-based supervised structure is a winner of 40 out of 85 datasets and the RTFN-based unsupervised clustering performs the best on 4 out of 11 datasets in the UCR2018 archive.
Path signatures are powerful nonparametric tools for time series analysis, shown to form a universal and characteristic feature map for Euclidean valued time series data. We lift the theory of path signatures to the setting of Lie group valued time series, adapting these tools for time series with underlying geometric constraints. We prove that this generalized path signature is universal and characteristic. To demonstrate universality, we analyze the human action recognition problem in computer vision, using $SO(3)$ representations for the time series, providing comparable performance to other shallow learning approaches, while offering an easily interpretable feature set. We also provide a two-sample hypothesis test for Lie group-valued random walks to illustrate its characteristic property. Finally we provide algorithms and a Julia implementation of these methods.
A fundamental problem in statistical data analysis is testing whether two phenomena are related. When the phenomena in question are time series, many challenges emerge. The first is defining a dependence measure between time series at the population level, as well as a sample level test statistic. The second is computing or estimating the distribution of this test statistic under the null, as the permutation test procedure is invalid for most time series structures. This work aims to address these challenges by combining distance correlation and multiscale graph correlation (MGC) from independence testing literature and block permutation testing from time series analysis. Two hypothesis tests for testing the independence of time series are proposed. These procedures also characterize whether the dependence relationship between the series is linear or nonlinear, and the time lag at which this dependence is maximized. For strictly stationary auto-regressive moving average (ARMA) processes, the proposed independence tests are proven valid and consistent. Finally, neural connectivity in the brain is analyzed using fMRI data, revealing linear dependence of signals within the visual network and default mode network, and nonlinear relationships in other regions. This work opens up new theoretical and practical directions for many modern time series analysis problems.