Event occurrence is not only subject to the environmental changes, but is also facilitated by the events that have occurred in a system. Here, we develop a method for estimating such extrinsic and intrinsic factors from a single series of event-occurrence times. The analysis is performed using a model that combines the inhomogeneous Poisson process and the Hawkes process, which represent exogenous fluctuations and endogenous chain-reaction mechanisms, respectively. The model is fit to a given dataset by minimizing the free energy, for which statistical physics and a path-integral method are utilized. Because the process of event occurrence is stochastic, parameter estimation is inevitably accompanied by errors, and it can ultimately occur that exogenous and endogenous factors cannot be captured even with the best estimator. We obtained four regimes categorized according to whether respective factors are detected. By applying the analytical method to real time series of debate in a social-networking service, we have observed that the estimated exogenous and endogenous factors are close to the first comments and the follow-up comments, respectively. This method is general and applicable to a variety of data, and we have provided an application program, by which anyone can analyze any series of event times.
Machine learning algorithms are highly useful for the classification of time series data in astronomy in this era of peta-scale public survey data releases. These methods can facilitate the discovery of new unknown events in most astrophysical areas, as well as improving the analysis of samples of known phenomena. Machine learning algorithms use features extracted from collected data as input predictive variables. A public tool called Feature Analysis for Time Series (FATS) has proved an excellent workhorse for feature extraction, particularly light curve classification for variable objects. In this study, we present a major improvement to FATS, which corrects inconvenient design choices, minor details, and documentation for the re-engineering process. This improvement comprises a new Python package called "feets", which is important for future code-refactoring for astronomical software tools.
We develop an interpretable and learnable Wigner-Ville distribution that produces a super-resolved quadratic signal representation for time-series analysis. Our approach has two main hallmarks. First, it interpolates between known time-frequency representations (TFRs) in that it can reach super-resolution with increased time and frequency resolution beyond what the Heisenberg uncertainty principle prescribes and thus beyond commonly employed TFRs, Second, it is interpretable thanks to an explicit low-dimensional and physical parameterization of the Wigner-Ville distribution. We demonstrate that our approach is able to learn highly adapted TFRs and is ready and able to tackle various large-scale classification tasks, where we reach state-of-the-art performance compared to baseline and learned TFRs.
Vanilla convolutional neural networks are known to provide superior performance not only in image recognition tasks but also in natural language processing and time series analysis. One of the strengths of convolutional layers is the ability to learn features about spatial relations in the input domain using various parameterized convolutional kernels. However, in time series analysis learning such spatial relations is not necessarily required nor effective. In such cases, kernels which model temporal dependencies or kernels with broader spatial resolutions are recommended for more efficient training as proposed by dilation kernels. However, the dilation has to be fixed a priori which limits the flexibility of the kernels. We propose generalized dilation networks which generalize the initial dilations in two aspects. First we derive an end-to-end learnable architecture for dilation layers where also the dilation rate can be learned. Second we break up the strict dilation structure, in that we develop kernels operating independently in the input space.
Prediction of user traffic in cellular networks has attracted profound attention for improving resource utilization. In this paper, we study the problem of network traffic traffic prediction and classification by employing standard machine learning and statistical learning time series prediction methods, including long short-term memory (LSTM) and autoregressive integrated moving average (ARIMA), respectively. We present an extensive experimental evaluation of the designed tools over a real network traffic dataset. Within this analysis, we explore the impact of different parameters to the effectiveness of the predictions. We further extend our analysis to the problem of network traffic classification and prediction of traffic bursts. The results, on the one hand, demonstrate superior performance of LSTM over ARIMA in general, especially when the length of the training time series is high enough, and it is augmented by a wisely-selected set of features. On the other hand, the results shed light on the circumstances in which, ARIMA performs close to the optimal with lower complexity.
Neural networks-based learning of the distribution of non-dispatchable renewable electricity generation from sources such as photovoltaics (PV) and wind as well as load demands has recently gained attention. Normalizing flow density models have performed particularly well in this task due to the training through direct log-likelihood maximization. However, research from the field of image generation has shown that standard normalizing flows can only learn smeared-out versions of manifold distributions and can result in the generation of noisy data. To avoid the generation of time series data with unrealistic noise, we propose a dimensionality-reducing flow layer based on the linear principal component analysis (PCA) that sets up the normalizing flow in a lower-dimensional space. We train the resulting principal component flow (PCF) on data of PV and wind power generation as well as load demand in Germany in the years 2013 to 2015. The results of this investigation show that the PCF preserves critical features of the original distributions, such as the probability density and frequency behavior of the time series. The application of the PCF is, however, not limited to renewable power generation but rather extends to any data set, time series, or otherwise, which can be efficiently reduced using PCA.
We consider the viability of a modularised mechanistic online machine learning framework to learn signals in low-frequency financial time series data. The framework is proved on daily sampled closing time-series data from JSE equity markets. The input patterns are vectors of pre-processed sequences of daily, weekly and monthly or quarterly sampled feature changes. The data processing is split into a batch processed step where features are learnt using a stacked autoencoder via unsupervised learning, and then both batch and online supervised learning are carried out using these learnt features, with the output being a point prediction of measured time-series feature fluctuations. Weight initializations are implemented with restricted Boltzmann machine pre-training, and variance based initializations. Historical simulations are then run using an online feedforward neural network initialised with the weights from the batch training and validation step. The validity of results are considered under a rigorous assessment of backtest overfitting using both combinatorially symmetrical cross validation and probabilistic and deflated Sharpe ratios. Results are used to develop a view on the phenomenology of financial markets and the value of complex historical data-analysis for trading under the unstable adaptive dynamics that characterise financial markets.
The striking fractal geometry of strange attractors underscores the generative nature of chaos: like probability distributions, chaotic systems can be repeatedly measured to produce arbitrarily-detailed information about the underlying attractor. Chaotic systems thus pose a unique challenge to modern statistical learning techniques, while retaining quantifiable mathematical properties that make them controllable and interpretable as benchmarks. Here, we present a growing database currently comprising 131 known chaotic dynamical systems spanning fields such as astrophysics, climatology, and biochemistry. Each system is paired with precomputed multivariate and univariate time series. Our dataset has comparable scale to existing static time series databases; however, our systems can be re-integrated to produce additional datasets of arbitrary length and granularity. Our dataset is annotated with known mathematical properties of each system, and we perform feature analysis to broadly categorize the diverse dynamics present across the collection. Chaotic systems inherently challenge forecasting models, and across extensive benchmarks we correlate forecasting performance with the degree of chaos present. We also exploit the unique generative properties of our dataset in several proof-of-concept experiments: surrogate transfer learning to improve time series classification, importance sampling to accelerate model training, and benchmarking symbolic regression algorithms.
Failure in brittle materials led by the evolution of micro- to macro-cracks under repetitive or increasing loads is often catastrophic with no significant plasticity to advert the onset of fracture. Early failure detection with respective location are utterly important features in any practical application, both of which can be effectively addressed using artificial intelligence. In this paper, we develop a supervised machine learning (ML) framework to predict failure in an isothermal, linear elastic and isotropic phase-field model for damage and fatigue of brittle materials. Time-series data of the phase-field model is extracted from virtual sensing nodes at different locations of the geometry. A pattern recognition scheme is introduced to represent time-series data/sensor nodes responses as a pattern with a corresponding label, integrated with ML algorithms, used for damage classification with identified patterns. We perform an uncertainty analysis by superposing random noise to the time-series data to assess the robustness of the framework with noise-polluted data. Results indicate that the proposed framework is capable of predicting failure with acceptable accuracy even in the presence of high noise levels. The findings demonstrate satisfactory performance of the supervised ML framework, and the applicability of artificial intelligence and ML to a practical engineering problem, i.,e, data-driven failure prediction in brittle materials.
A theoretical framework that supports automated construction of dynamic prime models purely from experimental time series data has been invented and developed, which can automatically generate (construct) data-driven models of any time series data in seconds. This has resulted in the formulation and formalisation of new reverse engineering and dynamic methods for automated systems modelling of complex systems, including complex biological, financial, control, and artificial neural network systems. The systems/model theory behind the invention has been formalised as a new, effective and robust system identification strategy complementary to process-based modelling. The proposed dynamic modelling and network inference solutions often involve tackling extremely difficult parameter estimation challenges, inferring unknown underlying network structures, and unsupervised formulation and construction of smart and intelligent ODE models of complex systems. In underdetermined conditions, i.e., cases of dealing with how best to instantaneously and rapidly construct data-consistent prime models of unknown (or well-studied) complex system from small-sized time series data, inference of unknown underlying network of interaction is more challenging. This article reports a robust step-by-step mathematical and computational analysis of the entire prime model construction process that determines a model from data in less than a minute.