Abstract:We advocate for a new statistical principle that combines the most desirable aspects of both parameter inference and density estimation. This leads us to the predictively oriented (PrO) posterior, which expresses uncertainty as a consequence of predictive ability. Doing so leads to inferences which predictively dominate both classical and generalised Bayes posterior predictive distributions: up to logarithmic factors, PrO posteriors converge to the predictively optimal model average at rate $n^{-1/2}$. Whereas classical and generalised Bayes posteriors only achieve this rate if the model can recover the data-generating process, PrO posteriors adapt to the level of model misspecification. This means that they concentrate around the true model at rate $n^{1/2}$ in the same way as Bayes and Gibbs posteriors if the model can recover the data-generating distribution, but do \textit{not} concentrate in the presence of non-trivial forms of model misspecification. Instead, they stabilise towards a predictively optimal posterior whose degree of irreducible uncertainty admits an interpretation as the degree of model misspecification -- a sharp contrast to how Bayesian uncertainty and its existing extensions behave. Lastly, we show that PrO posteriors can be sampled from by evolving particles based on mean field Langevin dynamics, and verify the practical significance of our theoretical developments on a number of numerical examples.




Abstract:Combining predictions from different models is a central problem in Bayesian inference and machine learning more broadly. Currently, these predictive distributions are almost exclusively combined using linear mixtures such as Bayesian model averaging, Bayesian stacking, and mixture of experts. Such linear mixtures impose idiosyncrasies that might be undesirable for some applications, such as multi-modality. While there exist alternative strategies (e.g. geometric bridge or superposition), optimising their parameters usually involves computing an intractable normalising constant repeatedly. We present two novel Bayesian model combination tools. These are generalisations of model stacking, but combine posterior densities by log-linear pooling (locking) and quantum superposition (quacking). To optimise model weights while avoiding the burden of normalising constants, we investigate the Hyvarinen score of the combined posterior predictions. We demonstrate locking with an illustrative example and discuss its practical application with importance sampling.