Abstract:In classical canonical correlation analysis (CCA), the goal is to determine the linear transformations of two random vectors into two new random variables that are most strongly correlated. Canonical variables are pairs of these new random variables, while canonical correlations are correlations between these pairs. In this paper, we propose and study two generalizations of this classical method: (1) Instead of two random vectors we study more complex data structures that appear in important applications. In these structures, there are $L$ features, each described by $p_l$ scalars, $1 \le l \le L$. We observe $n$ such objects over $T$ time points. We derive a suitable analog of the CCA for such data. Our approach relies on embeddings into Reproducing Kernel Hilbert Spaces, and covers several related data structures as well. (2) We develop an analogous approach for multidimensional random processes. In this case, the experimental units are multivariate continuous, square-integrable functions over a given interval. These functions are modeled as elements of a Hilbert space, so in this case, we define the multiple functional canonical correlation analysis, MFCCA. We justify our approaches by their application to two data sets and suitable large sample theory. We derive consistency rates for the related transformation and correlation estimators, and show that it is possible to relax two common assumptions on the compactness of the underlying cross-covariance operators and the independence of the data.
Abstract:Statistical depth functions provide center-outward orderings in spaces of dimension larger than one, where a natural ordering does not exist. The numerical evaluation of such depth functions can be computationally prohibitive, even for relatively low dimensions. We present a novel sequentially implemented Monte Carlo methodology for the computation of, theoretical and empirical, depth functions and related quantities (seMCD), that outputs an interval, a so-called seMCD-bucket, to which the quantity of interest belongs with a high probability prespecified by the user. For specific classes of depth functions, we adapt algorithms from sequential testing, providing finite-sample guarantees. For depth functions dependent on unknown distributions, we offer asymptotic guarantees using non-parametric statistical methods. In contrast to plain-vanilla Monte Carlo methodology the number of samples required in the algorithm is random but typically much smaller than standard choices suggested in the literature. The seMCD method can be applied to various depth functions, covering multivariate and functional spaces. We demonstrate the efficiency and reliability of our approach through empirical studies, highlighting its applicability in outlier or anomaly detection, classification, and depth region computation. In conclusion, the seMCD-algorithm can achieve accurate depth approximations with few Monte Carlo samples while maintaining rigorous statistical guarantees.