Abstract:The quantity of interest in the classical Cramér-Rao theory of unbiased estimation (e.g., the Cramér-Rao lower bound, its exact attainment for exponential families, and asymptotic efficiency of maximum likelihood estimation) is the variance, which represents the instability of an estimator when its value is compared to the value for an independently-sampled data set from the same distribution. In this paper we are interested in a quantity which represents the instability of an estimator when its value is compared to the value for an infinitesimal additive perturbation of the original data set; we refer to this as the "sensitivity" of an estimator. The resulting theory of sensitivity is based on the Wasserstein geometry in the same way that the classical theory of variance is based on the Fisher-Rao (equivalently, Hellinger) geometry, and this insight allows us to determine a collection of results which are analogous to the classical case: a Wasserstein-Cramér-Rao lower bound for the sensitivity of any unbiased estimator, a characterization of models in which there exist unbiased estimators achieving the lower bound exactly, and some concrete results that show that the Wasserstein projection estimator achieves the lower bound asymptotically. We use these results to treat many statistical examples, sometimes revealing new optimality properties for existing estimators and other times revealing entirely new estimators.
Abstract:A celebrated result of Pollard proves asymptotic consistency for $k$-means clustering when the population distribution has finite variance. In this work, we point out that the population-level $k$-means clustering problem is, in fact, well-posed under the weaker assumption of a finite expectation, and we investigate whether some form of asymptotic consistency holds in this setting. As we illustrate in a variety of negative results, the complete story is quite subtle; for example, the empirical $k$-means cluster centers may fail to converge even if there exists a unique set of population $k$-means cluster centers. A detailed analysis of our negative results reveals that inconsistency arises because of an extreme form of cluster imbalance, whereby the presence of outlying samples leads to some empirical $k$-means clusters possessing very few points. We then give a collection of positive results which show that some forms of asymptotic consistency, under only the assumption of finite expectation, may be recovered by imposing some a priori degree of balance among the empirical $k$-means clusters.
Abstract:We introduce a class of clustering procedures which includes $k$-means and $k$-medians, as well as variants of these where the domain of the cluster centers can be chosen adaptively (for example, $k$-medoids) and where the number of cluster centers can be chosen adaptively (for example, according to the elbow method). In the non-parametric setting and assuming only the finiteness of certain moments, we show that all clustering procedures in this class are strongly consistent under IID samples. Our method of proof is to directly study the continuity of various deterministic maps associated with these clustering procedures, and to show that strong consistency simply descends from analogous strong consistency of the empirical measures. In the adaptive setting, our work provides a strong consistency result that is the first of its kind. In the non-adaptive setting, our work strengthens Pollard's classical result by dispensing with various unnecessary technical hypotheses, by upgrading the particular notion of strong consistency, and by using the same methods to prove further limit theorems.