Human behavior modeling deals with learning and understanding of behavior patterns inherent in humans' daily routines. Existing pattern mining techniques either assume human dynamics is strictly periodic, or require the number of modes as input, or do not consider uncertainty in the sensor data. To handle these issues, in this paper, we propose a novel clustering approach for modeling human behavior (named, MTpattern) from time-series data. For mining frequent human behavior patterns effectively, we utilize a three-stage pipeline: (1) represent time series data into sequence of regularly sampled equal-sized unit time intervals for better analysis, (2) a new distance measure scheme is proposed to cluster similar sequences which can handle temporal variation and uncertainty in the data, and (3) exploit an exemplar-based clustering mechanism and fine-tune its parameters to output minimum number of clusters with given permissible distance constraints and without knowing the number of modes present in the data. Then, the average of all sequences in a cluster is considered as a human behavior pattern. Empirical studies on two real-world datasets and a simulated dataset demonstrate the effectiveness of MTpattern w.r.to internal and external measures of clustering.

Stock price forecasting is a highly complex and vitally important field of research. Recent advancements in deep neural network technology allow researchers to develop highly accurate models to predict financial trends. We propose a novel deep learning model called ST-GAN, or Stochastic Time-series Generative Adversarial Network, that analyzes both financial news texts and financial numerical data to predict stock trends. We utilize cutting-edge technology like the Generative Adversarial Network (GAN) to learn the correlations among textual and numerical data over time. We develop a new method of training a time-series GAN directly using the learned representations of Naive Bayes' sentiment analysis on financial text data alongside technical indicators from numerical data. Our experimental results show significant improvement over various existing models and prior research on deep neural networks for stock price forecasting.

A large fraction of the electronic health records (EHRs) consists of clinical measurements collected over time, such as lab tests and vital signs, which provide important information about a patient's health status. These sequences of clinical measurements are naturally represented as time series, characterized by multiple variables and large amounts of missing data, which complicate the analysis. In this work, we propose a novel kernel which is capable of exploiting both the information from the observed values as well the information hidden in the missing patterns in multivariate time series (MTS) originating e.g. from EHRs. The kernel, called TCK$_{IM}$, is designed using an ensemble learning strategy in which the base models are novel mixed mode Bayesian mixture models which can effectively exploit informative missingness without having to resort to imputation methods. Moreover, the ensemble approach ensures robustness to hyperparameters and therefore TCK$_{IM}$ is particularly well suited if there is a lack of labels - a known challenge in medical applications. Experiments on three real-world clinical datasets demonstrate the effectiveness of the proposed kernel.

Synthetic medical data which preserves privacy while maintaining utility can be used as an alternative to real medical data, which has privacy costs and resource constraints associated with it. At present, most models focus on generating cross-sectional health data which is not necessarily representative of real data. In reality, medical data is longitudinal in nature, with a single patient having multiple health events, non-uniformly distributed throughout their lifetime. These events are influenced by patient covariates such as comorbidities, age group, gender etc. as well as external temporal effects (e.g. flu season). While there exist seminal methods to model time series data, it becomes increasingly challenging to extend these methods to medical event time series data. Due to the complexity of the real data, in which each patient visit is an event, we transform the data by using summary statistics to characterize the events for a fixed set of time intervals, to facilitate analysis and interpretability. We then train a generative adversarial network to generate synthetic data. We demonstrate this approach by generating human sleep patterns, from a publicly available dataset. We empirically evaluate the generated data and show close univariate resemblance between synthetic and real data. However, we also demonstrate how stratification by covariates is required to gain a deeper understanding of synthetic data quality.

The Deep Space Network is NASA's international array of antennas that support interplanetary spacecraft missions. A track is a block of multi-dimensional time series from the beginning to end of DSN communication with the target spacecraft, containing thousands of monitor data items lasting several hours at a frequency of 0.2-1Hz. Monitor data on each track reports on the performance of specific spacecraft operations and the DSN itself. DSN is receiving signals from 32 spacecraft across the solar system. DSN has pressure to reduce costs while maintaining the quality of support for DSN mission users. DSN Link Control Operators need to simultaneously monitor multiple tracks and identify anomalies in real time. DSN has seen that as the number of missions increases, the data that needs to be processed increases over time. In this project, we look at the last 8 years of data for analysis. Any anomaly in the track indicates a problem with either the spacecraft, DSN equipment, or weather conditions. DSN operators typically write Discrepancy Reports for further analysis. It is recognized that it would be quite helpful to identify 10 similar historical tracks out of the huge database to quickly find and match anomalies. This tool has three functions: (1) identification of the top 10 similar historical tracks, (2) detection of anomalies compared to the reference normal track, and (3) comparison of statistical differences between two given tracks. The requirements for these features were confirmed by survey responses from 21 DSN operators and engineers. The preliminary machine learning model has shown promising performance (AUC=0.92). We plan to increase the number of data sets and perform additional testing to improve performance further before its planned integration into the track visualizer interface to assist DSN field operators and engineers.

Time series prediction is a widespread and well studied problem with applications in many domains (medical, geoscience, network analysis, finance, econometry etc.). In the case of multivariate time series, the key to good performances is to properly capture the dependencies between the variates. Often, these variates are structured, i.e. they are localised in an abstract space, usually representing an aspect of the physical world, and prediction amounts to a form of diffusion of the information across that space over time. Several neural network models of diffusion have been proposed in the literature. However, most of the existing proposals rely on some a priori knowledge on the structure of the space, usually in the form of a graph weighing the pairwise diffusion capacity of its points. We argue that this piece of information can often be dispensed with, since data already contains the diffusion capacity information, and in a more reliable form than that obtained from the usually largely hand-crafted graphs. We propose instead a fully data-driven model which does not rely on such a graph, nor any other prior structural information. We conduct a first set of experiments to measure the impact on performance of a structural prior, as used in baseline models, and show that, except at very low data levels, it remains negligible, and beyond a threshold, it may even become detrimental. We then investigate, through a second set of experiments, the capacity of our model in two respects: treatment of missing data and domain adaptation.

sktime is an open source, Python based, sklearn compatible toolkit for time series analysis developed by researchers at the University of East Anglia (UEA), University College London and the Alan Turing Institute. A key initial goal for sktime was to provide time series classification functionality equivalent to that available in a related java package, tsml, also developed at UEA. We describe the implementation of six such classifiers in sktime and compare them to their tsml equivalents. We demonstrate correctness through equivalence of accuracy on a range of standard test problems and compare the build time of the different implementations. We find that there is significant difference in accuracy on only one of the six algorithms we look at (Proximity Forest). This difference is causing us some pain in debugging. We found a much wider range of difference in efficiency. Again, this was not unexpected, but it does highlight ways both toolkits could be improved.

This paper is concerned with the statistical analysis of matrix-valued time series. These are data collected over a network of sensors (typically a set of spatial locations), recording, over time, observations of multiple measurements. From such data, we propose to learn, in an online fashion, a graph that captures two aspects of dependency: one describing the sparse spatial relationship between sensors, and the other characterizing the measurement relationship. To this purpose, we introduce a novel multivariate autoregressive model to infer the graph topology encoded in the coefficient matrix which captures the sparse Granger causality dependency structure present in such matrix-valued time series. We decompose the graph by imposing a Kronecker sum structure on the coefficient matrix. We develop two online approaches to learn the graph in a recursive way. The first one uses Wald test for the projected OLS estimation, where we derive the asymptotic distribution for the estimator. For the second one, we formalize a Lasso-type optimization problem. We rely on homotopy algorithms to derive updating rules for estimating the coefficient matrix. Furthermore, we provide an adaptive tuning procedure for the regularization parameter. Numerical experiments using both synthetic and real data, are performed to support the effectiveness of the proposed learning approaches.

<<

28

29

30

31

32

33

34

35

36

37

38

39

40

>>