While the acquisition of time series has become increasingly more straightforward and sophisticated, developing dynamical models from time series is still a challenging and ever evolving problem domain. Within the last several years, to address this problem, there has been a merging of machine learning tools with what is called the dynamic mode decomposition (DMD). This general approach has been shown to be an especially promising avenue for sophisticated and accurate model development. Building on this prior body of work, we develop a deep learning DMD based method which makes use of the fundamental insight of Takens' Embedding Theorem to develop an adaptive learning scheme that better captures higher dimensional and chaotic dynamics. We call this method the Deep Learning Hankel DMD (DLHDMD). We show that the DLHDMD is able to generate accurate dynamics for chaotic time series, and we likewise explore how our method learns mappings which tend, after successful training, to significantly change the mutual information between dimensions in the dynamics. This appears to be a key feature in enhancing the DMD overall, and it should help provide further insight for developing more sophisticated deep learning methods for time series forecasting.
Reservoir computing is a best-in-class machine learning algorithm for processing information generated by dynamical systems using observed time-series data. Importantly, it requires very small training data sets, uses linear optimization, and thus requires minimal computing resources. However, the algorithm uses randomly sampled matrices to define the underlying recurrent neural network and has a multitude of metaparameters that must be optimized. Recent results demonstrate the equivalence of reservoir computing to nonlinear vector autoregression, which requires no random matrices, fewer metaparameters, and provides interpretable results. Here, we demonstrate that nonlinear vector autoregression excels at reservoir computing benchmark tasks and requires even shorter training data sets and training time, heralding the next generation of reservoir computing.
Data-driven sparse system identification becomes the general framework for a wide range of problems in science and engineering. It is a problem of growing importance in applied machine learning and artificial intelligence algorithms. In this work, we developed the Entropic Regression Software Package (ERFit), a MATLAB package for sparse system identification using the entropic regression method. The code requires minimal supervision, with a wide range of options that make it adapt easily to different problems in science and engineering. The ERFit is available at https://github.com/almomaa/ERFit-Package
Boolean functions and networks are commonly used in the modeling and analysis of complex biological systems, and this paradigm is highly relevant in other important areas in data science and decision making, such as in the medical field and in the finance industry. Automated learning of a Boolean network and Boolean functions, from data, is a challenging task due in part to the large number of unknowns (including both the structure of the network and the functions) to be estimated, for which a brute force approach would be exponentially complex. In this paper we develop a new information theoretic methodology that we show to be significantly more efficient than previous approaches. Building on the recently developed optimal causation entropy principle (oCSE), that we proved can correctly infer networks distinguishing between direct versus indirect connections, we develop here an efficient algorithm that furthermore infers a Boolean network (including both its structure and function) based on data observed from the evolving states at nodes. We call this new inference method, Boolean optimal causation entropy (BoCSE), which we will show that our method is both computationally efficient and also resilient to noise. Furthermore, it allows for selection of a set of features that best explains the process, a statement that can be described as a networked Boolean function reduced order model. We highlight our method to the feature selection in several real-world examples: (1) diagnosis of urinary diseases, (2) Cardiac SPECT diagnosis, (3) informative positions in the game Tic-Tac-Toe, and (4) risk causality analysis of loans in default status. Our proposed method is effective and efficient in all examples.
In this paper, we introduce a new tool for data-driven discovery of early warning signs of critical transitions in ice shelves, from remote sensing data. Our approach adopts principles of directed spectral clustering methodology considering an asymmetric affinity matrix and the associated directed graph Laplacian. We applied our approach generally to both reprocessed the ice velocity data, and also remote sensing satellite images of the Larsen C ice shelf. Our results allow us to (post-cast) predict fault lines responsible for the critical transitions leading to the break up of the Larsen C ice shelf crack, which resulted in the A68 iceberg, and we are able to do so, months earlier before the actual occurrence, and also much earlier than any other previously available methodology, in particular those based on interferometry.
Representation of a dynamical system in terms of simplifying modes is a central premise of reduced order modelling and a primary concern of the increasingly popular DMD (dynamic mode decomposition) empirical interpretation of Koopman operator analysis of complex systems. In the spirit of optimal approximation and reduced order modelling the goal of DMD methods and variants are to describe the dynamical evolution as a linear evolution in an appropriately transformed lower rank space, as best as possible. However, as far as we know there has not been an in depth study regarding the underlying geometry as related to an efficient representation. To this end we present that a good dictionary, that quite different from other's constructions, we need only to construct optimal initial data functions on a transverse co-dimension one set. Then the eigenfunctions on a subdomain follows the method of characteristics. The underlying geometry of Koopman eigenfunctions involves an extreme multiplicity whereby infinitely many eigenfunctions correspond to each eigenvalue that we resolved by our new concept as a quotient set of functions, in terms of matched level sets. We call this equivalence class of functions a ``primary eigenfunction" to further help us to resolve the relationship between the large number of eigenfunctions in perhaps an otherwise low dimensional phase space. This construction allows us to understand the geometric relationships between the numerous eigenfunctions in a useful way. Aspects are discussed how the underlying spectral decomposition as the point spectrum and continuous spectrum fundamentally rely on the domain.
Optical flow refers to the visual motion observed between two consecutive images. Since the degree of freedom is typically much larger than the constraints imposed by the image observations, the straightforward formulation of optical flow as an inverse problem is ill-posed. Standard approaches to determine optical flow rely on formulating and solving an optimization problem that contains both a data fidelity term and a regularization term, the latter effectively resolves the otherwise ill-posedness of the inverse problem. In this work, we depart from the deterministic formalism, and instead treat optical flow as a statistical inverse problem. We discuss how a classical optical flow solution can be interpreted as a point estimate in this more general framework. The statistical approach, whose "solution" is a distribution of flow fields, which we refer to as Bayesian optical flow, allows not only "point" estimates (e.g., the computation of average flow field), but also statistical estimates (e.g., quantification of uncertainty) that are beyond any standard method for optical flow. As application, we benchmark Bayesian optical flow together with uncertainty quantification using several types of prescribed ground-truth flow fields and images.
This article addresses the problem of two- and higher dimensional pattern matching, i.e. the identification of instances of a template within a larger signal space, which is a form of registration. Unlike traditional correlation, we aim at obtaining more selective matchings by considering more strict comparisons of gray-level intensity. In order to achieve fast matching, a nonlinear thresholded version of the fast Fourier transform is applied to a gray-level decomposition of the original 2D image. The potential of the method is substantiated with respect to real data involving the selective identification of neuronal cell bodies in gray-level images.