Abstract:We consider the problems of \emph{learning} and \emph{testing} real-valued convex functions over Gaussian space. Despite the extensive study of function convexity across mathematics, statistics, and computer science, its learnability and testability have largely been examined only in discrete or restricted settings -- typically with respect to the Hamming distance, which is ill-suited for real-valued functions. In contrast, we study these problems in high dimensions under the standard Gaussian measure, assuming sample access to the function and a mild smoothness condition, namely Lipschitzness. A smoothness assumption is natural and, in fact, necessary even in one dimension: without it, convexity cannot be inferred from finitely many samples. As our main results, we give: - Learning Convex Functions: An agnostic proper learning algorithm for Lipschitz convex functions that achieves error $\varepsilon$ using $n^{O(1/\varepsilon^2)}$ samples, together with a complementary lower bound of $n^{\mathrm{poly}(1/\varepsilon)}$ samples in the \emph{correlational statistical query (CSQ)} model. - Testing Convex Functions: A tolerant (two-sided) tester for convexity of Lipschitz functions with the same sample complexity (as a corollary of our learning result), and a one-sided tester (which never rejects convex functions) using $O(\sqrt{n}/\varepsilon)^n$ samples.
Abstract:We give a dimension-independent sparsification result for suprema of centered Gaussian processes: Let $T$ be any (possibly infinite) bounded set of vectors in $\mathbb{R}^n$, and let $\{{\boldsymbol{X}}_t\}_{t\in T}$ be the canonical Gaussian process on $T$. We show that there is an $O_\varepsilon(1)$-size subset $S \subseteq T$ and a set of real values $\{c_s\}_{s \in S}$ such that $\sup_{s \in S} \{{\boldsymbol{X}}_s + c_s\}$ is an $\varepsilon$-approximator of $\sup_{t \in T} {\boldsymbol{X}}_t$. Notably, the size of $S$ is completely independent of both the size of $T$ and of the ambient dimension $n$. We use this to show that every norm is essentially a junta when viewed as a function over Gaussian space: Given any norm $\nu(x)$ on $\mathbb{R}^n$, there is another norm $\psi(x)$ which depends only on the projection of $x$ along $O_\varepsilon(1)$ directions, for which $\psi({\boldsymbol{g}})$ is a multiplicative $(1 \pm \varepsilon)$-approximation of $\nu({\boldsymbol{g}})$ with probability $1-\varepsilon$ for ${\boldsymbol{g}} \sim N(0,I_n)$. We also use our sparsification result for suprema of centered Gaussian processes to give a sparsification lemma for convex sets of bounded geometric width: Any intersection of (possibly infinitely many) halfspaces in $\mathbb{R}^n$ that are at distance $O(1)$ from the origin is $\varepsilon$-close, under $N(0,I_n)$, to an intersection of only $O_\varepsilon(1)$ many halfspaces. We describe applications to agnostic learning and tolerant property testing.