We study the convergence rates of empirical Bayes posterior distributions for nonparametric and high-dimensional inference. We show that as long as the hyperparameter set is discrete, the empirical Bayes posterior distribution induced by the maximum marginal likelihood estimator can be regarded as a variational approximation to a hierarchical Bayes posterior distribution. This connection between empirical Bayes and variational Bayes allows us to leverage the recent results in the variational Bayes literature, and directly obtains the convergence rates of empirical Bayes posterior distributions from a variational perspective. For a more general hyperparameter set that is not necessarily discrete, we introduce a new technique called "prior decomposition" to deal with prior distributions that can be written as convex combinations of probability measures whose supports are low-dimensional subspaces. This leads to generalized versions of the classical "prior mass and testing" conditions for the convergence rates of empirical Bayes. Our theory is applied to a number of statistical estimation problems including nonparametric density estimation and sparse linear regression.
We study convergence rates of variational posterior distributions for nonparametric and high-dimensional inference. We formulate general conditions on prior, likelihood, and variational class that characterize the convergence rates. Under similar "prior mass and testing" conditions considered in the literature, the rate is found to be the sum of two terms. The first term stands for the convergence rate of the true posterior distribution, and the second term is contributed by the variational approximation error. For a class of priors that admit the structure of a mixture of product measures, we propose a novel prior mass condition, under which the variational approximation error of the generalized mean-field class is dominated by convergence rate of the true posterior. We demonstrate the applicability of our general results for various models, prior distributions and variational classes by deriving convergence rates of the corresponding variational posteriors.