Gaussian processes provide a framework for nonlinear nonparametric Bayesian inference widely applicable across science and engineering. Unfortunately, their computational burden scales cubically with the training sample size, which in the case that samples arrive in perpetuity, approaches infinity. This issue necessitates approximations for use with streaming data, which to date mostly lack convergence guarantees. Thus, we develop the first online Gaussian process approximation that preserves convergence to the population posterior, i.e., asymptotic posterior consistency, while ameliorating its intractable complexity growth with the sample size. We propose an online compression scheme that, following each a posteriori update, fixes an error neighborhood with respect to the Hellinger metric centered at the current posterior, and greedily tosses out past kernel dictionary elements until its boundary is hit. We call the resulting method Parsimonious Online Gaussian Processes (POG). For diminishing error radius, exact asymptotic consistency is preserved (Theorem 1(i)) at the cost of unbounded memory in the limit. On the other hand, for constant error radius, POG converges to a neighborhood of the population posterior (Theorem 1(ii))but with finite memory at-worst determined by the metric entropy of the feature space (Theorem 2). Experimental results are presented on several nonlinear regression problems which illuminates the merits of this approach as compared with alternatives that fix the subspace dimension defining the history of past points.
We consider the framework of learning over decentralized networks, where nodes observe unique, possibly correlated, observation streams. We focus on the case where agents learn a regression \emph{function} that belongs to a reproducing kernel Hilbert space (RKHS). In this setting, a decentralized network aims to learn nonlinear statistical models that are optimal in terms of a global stochastic convex functional that aggregates data across the network, with only access to a local data stream. We incentivize coordination while respecting network heterogeneity through the introduction of nonlinear proximity constraints. To solve it, we propose applying a functional variant of stochastic primal-dual (Arrow-Hurwicz) method which yields a decentralized algorithm. To handle the fact that the RKHS parameterization has complexity proportionate with the iteration index, we project the primal iterates onto Hilbert subspaces that are greedily constructed from the observation sequence of each node. The resulting proximal stochastic variant of Arrow-Hurwicz, dubbed Heterogeneous Adaptive Learning with Kernels (HALK), is shown to converge in expectation, both in terms of primal sub-optimality and constraint violation to a neighborhood that depends on a given constant step-size selection. Simulations on a correlated spatio-temporal random field estimation problem validate our theoretical results, which are born out in practice for networked oceanic sensing buoys estimating temperature and salinity from depth measurements.