This work introduces Dirichlet Active Learning (DiAL), a Bayesian-inspired approach to the design of active learning algorithms. Our framework models feature-conditional class probabilities as a Dirichlet random field and lends observational strength between similar features in order to calibrate the random field. This random field can then be utilized in learning tasks: in particular, we can use current estimates of mean and variance to conduct classification and active learning in the context where labeled data is scarce. We demonstrate the applicability of this model to low-label rate graph learning by constructing ``propagation operators'' based upon the graph Laplacian, and offer computational studies demonstrating the method's competitiveness with the state of the art. Finally, we provide rigorous guarantees regarding the ability of this approach to ensure both exploration and exploitation, expressed respectively in terms of cluster exploration and increased attention to decision boundaries.
The rapid advancement of Generative Adversarial Networks (GANs) necessitates the need to robustly evaluate these models. Among the established evaluation criteria, the Fr\'{e}chet Inception Distance (FID) has been widely adopted due to its conceptual simplicity, fast computation time, and strong correlation with human perception. However, FID has inherent limitations, mainly stemming from its assumption that feature embeddings follow a Gaussian distribution, and therefore can be defined by their first two moments. As this does not hold in practice, in this paper we explore the importance of third-moments in image feature data and use this information to define a new measure, which we call the Skew Inception Distance (SID). We prove that SID is a pseudometric on probability distributions, show how it extends FID, and present a practical method for its computation. Our numerical experiments support that SID either tracks with FID or, in some cases, aligns more closely with human perception when evaluating image features of ImageNet data.
In the (special) smoothing spline problem one considers a variational problem with a quadratic data fidelity penalty and Laplacian regularisation. Higher order regularity can be obtained via replacing the Laplacian regulariser with a poly-Laplacian regulariser. The methodology is readily adapted to graphs and here we consider graph poly-Laplacian regularisation in a fully supervised, non-parametric, noise corrupted, regression problem. In particular, given a dataset $\{x_i\}_{i=1}^n$ and a set of noisy labels $\{y_i\}_{i=1}^n\subset\mathbb{R}$ we let $u_n:\{x_i\}_{i=1}^n\to\mathbb{R}$ be the minimiser of an energy which consists of a data fidelity term and an appropriately scaled graph poly-Laplacian term. When $y_i = g(x_i)+\xi_i$, for iid noise $\xi_i$, and using the geometric random graph, we identify (with high probability) the rate of convergence of $u_n$ to $g$ in the large data limit $n\to\infty$. Furthermore, our rate, up to logarithms, coincides with the known rate of convergence in the usual smoothing spline model.
Statistical depths provide a fundamental generalization of quantiles and medians to data in higher dimensions. This paper proposes a new type of globally defined statistical depth, based upon control theory and eikonal equations, which measures the smallest amount of probability density that has to be passed through in a path to points outside the support of the distribution: for example spatial infinity. This depth is easy to interpret and compute, expressively captures multi-modal behavior, and extends naturally to data that is non-Euclidean. We prove various properties of this depth, and provide discussion of computational considerations. In particular, we demonstrate that this notion of depth is robust under an aproximate isometrically constrained adversarial model, a property which is not enjoyed by the Tukey depth. Finally we give some illustrative examples in the context of two-dimensional mixture models and MNIST.
We establish an equivalence between a family of adversarial training problems for non-parametric binary classification and a family of regularized risk minimization problems where the regularizer is a nonlocal perimeter functional. The resulting regularized risk minimization problems admit exact convex relaxations of the type $L^1+$ (nonlocal) $\operatorname{TV}$, a form frequently studied in image analysis and graph-based learning. A rich geometric structure is revealed by this reformulation which in turn allows us to establish a series of properties of optimal solutions of the original problem, including the existence of minimal and maximal solutions (interpreted in a suitable sense), and the existence of regular solutions (also interpreted in a suitable sense). In addition, we highlight how the connection between adversarial training and perimeter minimization problems provides a novel, directly interpretable, statistical motivation for a family of regularized risk minimization problems involving perimeter/total variation. The majority of our theoretical results are independent of the distance used to define adversarial attacks.
The widespread application of modern machine learning has increased the need for robust statistical algorithms. This work studies one such fundamental statistical measure known as the Tukey depth. We study the problem in the continuum (population) limit. In particular, we derive the associated necessary conditions, which take the form of a first-order partial differential equation. We discuss the classical interpretation of this necessary condition as the viscosity solution of a Hamilton-Jacobi equation, but with a non-classical Hamiltonian with discontinuous dependence on the gradient at zero. We prove that this equation possesses a unique viscosity solution and that this solution always bounds the Tukey depth from below. In certain cases, we prove that the Tukey depth is equal to the viscosity solution, and we give some illustrations of standard numerical methods from the optimal control community which deal directly with the partial differential equation. We conclude by outlining several promising research directions both in terms of new numerical algorithms and theoretical challenges.
We study a version of adversarial classification where an adversary is empowered to corrupt data inputs up to some distance $\varepsilon$, using tools from variational analysis. In particular, we describe necessary conditions associated with the optimal classifier subject to such an adversary. Using the necessary conditions, we derive a geometric evolution equation which can be used to track the change in classification boundaries as $\varepsilon$ varies. This evolution equation may be described as an uncoupled system of differential equations in one dimension, or as a mean curvature type equation in higher dimension. In one dimension we rigorously prove that one can use the initial value problem starting from $\varepsilon=0$, which is simply the Bayes classifier, in order to solve for the global minimizer of the adversarial problem. Numerical examples illustrating these ideas are also presented.
In this work we study statistical properties of graph-based clustering algorithms that rely on the optimization of balanced graph cuts, the main example being the optimization of Cheeger cuts. We consider proximity graphs built from data sampled from an underlying distribution supported on a generic smooth compact manifold $M$. In this setting, we obtain high probability convergence rates for both the Cheeger constant and the associated Cheeger cuts towards their continuum counterparts. The key technical tools are careful estimates of interpolation operators which lift empirical Cheeger cuts to the continuum, as well as continuum stability estimates for isoperimetric problems. To our knowledge the quantitative estimates obtained here are the first of their kind.
We study asymptotic consistency guarantees for a non-parametric regression problem with Laplacian regularization. In particular, we consider $(x_1, y_1), \dots, (x_n, y_n)$ samples from some distribution on the cross product $\mathcal{M} \times \mathbb{R}$, where $\mathcal{M}$ is a $m$-dimensional manifold embedded in $\mathbb{R}^d$. A geometric graph on the cloud $\{x_1, \dots, x_n \}$ is constructed by connecting points that are within some specified distance $\varepsilon_n$. A suitable semi-linear equation involving the resulting graph Laplacian is used to obtain a regressor for the observed values of $y$. We establish probabilistic error rates for the uniform difference between the regressor constructed from the observed data and the Bayes regressor (or trend) associated to the ground-truth distribution. We give the explicit dependence of the rates in terms of the parameter $\varepsilon_n$, the strength of regularization $\beta_n$, and the number of data points $n$. Our argument relies on a simple, yet powerful, maximum principle for the graph Laplacian. We also address a simple extension of the framework to a semi-supervised setting.
This work considers the problem of binary classification: given training data $x_1, \dots, x_n$ from a certain population, together with associated labels $y_1,\dots, y_n \in \left\{0,1 \right\}$, determine the best label for an element $x$ not among the training data. More specifically, this work considers a variant of the regularized empirical risk functional which is defined intrinsically to the observed data and does not depend on the underlying population. Tools from modern analysis are used to obtain a concise proof of asymptotic consistency as regularization parameters are taken to zero at rates related to the size of the sample. These analytical tools give a new framework for understanding overfitting and underfitting, and rigorously connect the notion of overfitting with a loss of compactness.