We present an end-to-end differentiable neural network architecture to perform anomaly detection in multivariate time series by incorporating a Sequential Probability Ratio Test on the prediction residual. The architecture is a cascade of dynamical systems designed to separate linearly predictable components of the signal such as trends and seasonality, from the non-linear ones. The former are modeled by local Linear Dynamic Layers, and their residual is fed to a generic Temporal Convolutional Network that also aggregates global statistics from different time series as context for the local predictions of each one. The last layer implements the anomaly detector, which exploits the temporal structure of the prediction residuals to detect both isolated point anomalies and set-point changes. It is based on a novel application of the classic CUMSUM algorithm, adapted through the use of a variational approximation of f-divergences. The model automatically adapts to the time scales of the observed signals. It approximates a SARIMA model at the get-go, and auto-tunes to the statistics of the signal and its covariates, without the need for supervision, as more data is observed. The resulting system, which we call STRIC, outperforms both state-of-the-art robust statistical methods and deep neural network architectures on multiple anomaly detection benchmarks.
Regularization and Bayesian methods for system identification have been repopularized in the recent years, and proved to be competitive w.r.t. classical parametric approaches. In this paper we shall make an attempt to illustrate how the use of regularization in system identification has evolved over the years, starting from the early contributions both in the Automatic Control as well as Econometrics and Statistics literature. In particular we shall discuss some fundamental issues such as compound estimation problems and exchangeability which play and important role in regularization and Bayesian approaches, as also illustrated in early publications in Statistics. The historical and foundational issues will be given more emphasis (and space), at the expense of the more recent developments which are only briefly discussed. The main reason for such a choice is that, while the recent literature is readily available, and surveys have already been published on the subject, in the author's opinion a clear link with past work had not been completely clarified.
This paper compares classical parametric methods with recently developed Bayesian methods for system identification. A Full Bayes solution is considered together with one of the standard approximations based on the Empirical Bayes paradigm. Results regarding point estimators for the impulse response as well as for confidence regions are reported.