Dynamic graph embeddings, inductive and incremental learning facilitate predictive tasks such as node classification and link prediction. However, predicting the structure of a graph at a future time step from a time series of graphs, allowing for new nodes has not gained much attention. In this paper, we present such an approach. We use time series methods to predict the node degree at future time points and combine it with flux balance analysis -- a linear programming method used in biochemistry -- to obtain the structure of future graphs. Furthermore, we explore the predictive graph distribution for different parameter values. We evaluate this method using synthetic and real datasets and demonstrate its utility and applicability.
Keystroke dynamics is a behavioural biometric utilised for user identification and authentication. We propose a new set of features based on the distance between keys on the keyboard, a concept that has not been considered before in keystroke dynamics. We combine flight times, a popular metric, with the distance between keys on the keyboard and call them as Distance Enhanced Flight Time features (DEFT). This novel approach provides comprehensive insights into a person's typing behaviour, surpassing typing velocity alone. We build a DEFT model by combining DEFT features with other previously used keystroke dynamic features. The DEFT model is designed to be device-agnostic, allowing us to evaluate its effectiveness across three commonly used devices: desktop, mobile, and tablet. The DEFT model outperforms the existing state-of-the-art methods when we evaluate its effectiveness across two datasets. We obtain accuracy rates exceeding 99% and equal error rates below 10% on all three devices.
Item Response Theory (IRT) has been proposed within the field of Educational Psychometrics to assess student ability as well as test question difficulty and discrimination power. More recently, IRT has been applied to evaluate machine learning algorithm performance on a single classification dataset, where the student is now an algorithm, and the test question is an observation to be classified by the algorithm. In this paper we present a modified IRT-based framework for evaluating a portfolio of algorithms across a repository of datasets, while simultaneously eliciting a richer suite of characteristics - such as algorithm consistency and anomalousness - that describe important aspects of algorithm performance. These characteristics arise from a novel inversion and reinterpretation of the traditional IRT model without requiring additional dataset feature computations. We test this framework on algorithm portfolios for a wide range of applications, demonstrating the broad applicability of this method as an insightful algorithm evaluation tool. Furthermore, the explainable nature of IRT parameters yield an increased understanding of algorithm portfolios.
Detecting anomalies from a series of temporal networks has many applications, including road accidents in transport networks and suspicious events in social networks. While there are many methods for network anomaly detection, statistical methods are under utilised in this space even though they have a long history and proven capability in handling temporal dependencies. In this paper, we introduce \textit{oddnet}, a feature-based network anomaly detection method that uses time series methods to model temporal dependencies. We demonstrate the effectiveness of oddnet on synthetic and real-world datasets. The R package oddnet implements this algorithm.
Many water-quality monitoring programs aim to measure turbidity to help guide effective management of waterways and catchments, yet distributing turbidity sensors throughout networks is typically cost prohibitive. To this end, we built and compared the ability of dynamic regression (ARIMA), long short-term memory neural nets (LSTM), and generalized additive models (GAM) to forecast stream turbidity one step ahead, using surrogate data from relatively low-cost in-situ sensors and publicly available databases. We iteratively trialled combinations of four surrogate covariates (rainfall, water level, air temperature and total global solar exposure) selecting a final model for each type that minimised the corrected Akaike Information Criterion. Cross-validation using a rolling time-window indicated that ARIMA, which included the rainfall and water-level covariates only, produced the most accurate predictions, followed closely by GAM, which included all four covariates. We constructed a meta-model, trained on time-series features of turbidity, to take advantage of the strengths of each model over different time points and predict the best model (that with the lowest forecast error one-step prior) for each time step. The meta-model outperformed all other models, indicating that this methodology can yield high accuracy and may be a viable alternative to using measurements sourced directly from turbidity-sensors where costs prohibit their deployment and maintenance, and when predicting turbidity across the short term. Our findings also indicated that temperature and light-associated variables, for example underwater illuminance, may hold promise as cost-effective, high-frequency surrogates of turbidity, especially when combined with other covariates, like rainfall, that are typically measured at coarse levels of spatial resolution.
Constructing an ensemble from a heterogeneous set of unsupervised anomaly detection methods is challenging because the class labels or the ground truth is unknown. Thus, traditional ensemble techniques that use the response variable or the class labels cannot be used to construct an ensemble for unsupervised anomaly detection. We use Item Response Theory (IRT) -- a class of models used in educational psychometrics to assess student and test question characteristics -- to construct an unsupervised anomaly detection ensemble. IRT's latent trait computation lends itself to anomaly detection because the latent trait can be used to uncover the hidden ground truth. Using a novel IRT mapping to the anomaly detection problem, we construct an ensemble that can downplay noisy, non-discriminatory methods and accentuate sharper methods. We demonstrate the effectiveness of the IRT ensemble on an extensive data repository, by comparing its performance to other ensemble techniques.