Regularization plays a pivotal role in ill-posed machine learning and inverse problems. However, the fundamental comparative analysis of various regularization norms remains open. We establish a small noise analysis framework to assess the effects of norms in Tikhonov and RKHS regularizations, in the context of ill-posed linear inverse problems with Gaussian noise. This framework studies the convergence rates of regularized estimators in the small noise limit and reveals the potential instability of the conventional L2-regularizer. We solve such instability by proposing an innovative class of adaptive fractional RKHS regularizers, which covers the L2 Tikhonov and RKHS regularizations by adjusting the fractional smoothness parameter. A surprising insight is that over-smoothing via these fractional RKHSs consistently yields optimal convergence rates, but the optimal hyper-parameter may decay too fast to be selected in practice.
Kernels are efficient in representing nonlocal dependence and they are widely used to design operators between function spaces. Thus, learning kernels in operators from data is an inverse problem of general interest. Due to the nonlocal dependence, the inverse problem can be severely ill-posed with a data-dependent singular inversion operator. The Bayesian approach overcomes the ill-posedness through a non-degenerate prior. However, a fixed non-degenerate prior leads to a divergent posterior mean when the observation noise becomes small, if the data induces a perturbation in the eigenspace of zero eigenvalues of the inversion operator. We introduce a data-adaptive prior to achieve a stable posterior whose mean always has a small noise limit. The data-adaptive prior's covariance is the inversion operator with a hyper-parameter selected adaptive to data by the L-curve method. Furthermore, we provide a detailed analysis on the computational practice of the data-adaptive prior, and demonstrate it on Toeplitz matrices and integral operators. Numerical tests show that a fixed prior can lead to a divergent posterior mean in the presence of any of the four types of errors: discretization error, model error, partial observation and wrong noise assumption. In contrast, the data-adaptive prior always attains posterior means with small noise limits.
We present DARTR: a Data Adaptive RKHS Tikhonov Regularization method for the linear inverse problem of nonparametric learning of function parameters in operators. A key ingredient is a system intrinsic data-adaptive (SIDA) RKHS, whose norm restricts the learning to take place in the function space of identifiability. DARTR utilizes this norm and selects the regularization parameter by the L-curve method. We illustrate its performance in examples including integral operators, nonlinear operators and nonlocal operators with discrete synthetic data. Numerical results show that DARTR leads to an accurate estimator robust to both numerical error due to discrete data and noise in data, and the estimator converges at a consistent rate as the data mesh refines under different levels of noises, outperforming two baseline regularizers using $l^2$ and $L^2$ norms.
We study the identifiability of the interaction kernels in mean-field equations for intreacting particle systems. The key is to identify function spaces on which a probabilistic loss functional has a unique minimizer. We prove that identifiability holds on any subspace of two reproducing kernel Hilbert spaces (RKHS), whose reproducing kernels are intrinsic to the system and are data-adaptive. Furthermore, identifiability holds on two ambient L2 spaces if and only if the integral operators associated with the reproducing kernels are strictly positive. Thus, the inverse problem is ill-posed in general. We also discuss the implications of identifiability in computational practice.
We introduce a nonparametric algorithm to learn interaction kernels of mean-field equations for 1st-order systems of interacting particles. The data consist of discrete space-time observations of the solution. By least squares with regularization, the algorithm learns the kernel on data-adaptive hypothesis spaces efficiently. A key ingredient is a probabilistic error functional derived from the likelihood of the mean-field equation's diffusion process. The estimator converges, in a reproducing kernel Hilbert space and an L2 space under an identifiability condition, at a rate optimal in the sense that it equals the numerical integrator's order. We demonstrate our algorithm on three typical examples: the opinion dynamics with a piecewise linear kernel, the granular media model with a quadratic kernel, and the aggregation-diffusion with a repulsive-attractive kernel.