Alert button
Picture for Matthias Katzfuss

Matthias Katzfuss

Alert button

Vecchia Gaussian Process Ensembles on Internal Representations of Deep Neural Networks

May 26, 2023
Felix Jimenez, Matthias Katzfuss

Figure 1 for Vecchia Gaussian Process Ensembles on Internal Representations of Deep Neural Networks
Figure 2 for Vecchia Gaussian Process Ensembles on Internal Representations of Deep Neural Networks
Figure 3 for Vecchia Gaussian Process Ensembles on Internal Representations of Deep Neural Networks
Figure 4 for Vecchia Gaussian Process Ensembles on Internal Representations of Deep Neural Networks

For regression tasks, standard Gaussian processes (GPs) provide natural uncertainty quantification, while deep neural networks (DNNs) excel at representation learning. We propose to synergistically combine these two approaches in a hybrid method consisting of an ensemble of GPs built on the output of hidden layers of a DNN. GP scalability is achieved via Vecchia approximations that exploit nearest-neighbor conditional independence. The resulting deep Vecchia ensemble not only imbues the DNN with uncertainty quantification but can also provide more accurate and robust predictions. We demonstrate the utility of our model on several datasets and carry out experiments to understand the inner workings of the proposed method.

* 16 pages, 7 figures 
Viaarxiv icon

Variational sparse inverse Cholesky approximation for latent Gaussian processes via double Kullback-Leibler minimization

Jan 30, 2023
Jian Cao, Myeongjong Kang, Felix Jimenez, Huiyan Sang, Florian Schafer, Matthias Katzfuss

Figure 1 for Variational sparse inverse Cholesky approximation for latent Gaussian processes via double Kullback-Leibler minimization
Figure 2 for Variational sparse inverse Cholesky approximation for latent Gaussian processes via double Kullback-Leibler minimization
Figure 3 for Variational sparse inverse Cholesky approximation for latent Gaussian processes via double Kullback-Leibler minimization
Figure 4 for Variational sparse inverse Cholesky approximation for latent Gaussian processes via double Kullback-Leibler minimization

To achieve scalable and accurate inference for latent Gaussian processes, we propose a variational approximation based on a family of Gaussian distributions whose covariance matrices have sparse inverse Cholesky (SIC) factors. We combine this variational approximation of the posterior with a similar and efficient SIC-restricted Kullback-Leibler-optimal approximation of the prior. We then focus on a particular SIC ordering and nearest-neighbor-based sparsity pattern resulting in highly accurate prior and posterior approximations. For this setting, our variational approximation can be computed via stochastic gradient descent in polylogarithmic time per iteration. We provide numerical comparisons showing that the proposed double-Kullback-Leibler-optimal Gaussian-process approximation (DKLGP) can sometimes be vastly more accurate than alternative approaches such as inducing-point and mean-field approximations at similar computational complexity.

Viaarxiv icon

Scalable Bayesian Optimization Using Vecchia Approximations of Gaussian Processes

Mar 02, 2022
Felix Jimenez, Matthias Katzfuss

Figure 1 for Scalable Bayesian Optimization Using Vecchia Approximations of Gaussian Processes
Figure 2 for Scalable Bayesian Optimization Using Vecchia Approximations of Gaussian Processes
Figure 3 for Scalable Bayesian Optimization Using Vecchia Approximations of Gaussian Processes
Figure 4 for Scalable Bayesian Optimization Using Vecchia Approximations of Gaussian Processes

Bayesian optimization is a technique for optimizing black-box target functions. At the core of Bayesian optimization is a surrogate model that predicts the output of the target function at previously unseen inputs to facilitate the selection of promising input values. Gaussian processes (GPs) are commonly used as surrogate models but are known to scale poorly with the number of observations. We adapt the Vecchia approximation, a popular GP approximation from spatial statistics, to enable scalable high-dimensional Bayesian optimization. We develop several improvements and extensions, including training warped GPs using mini-batch gradient descent, approximate neighbor search, and selecting multiple input values in parallel. We focus on the use of our warped Vecchia GP in trust-region Bayesian optimization via Thompson sampling. On several test functions and on two reinforcement-learning problems, our methods compared favorably to the state of the art.

Viaarxiv icon

Scalable Gaussian-process regression and variable selection using Vecchia approximations

Mar 02, 2022
Jian Cao, Joseph Guinness, Marc G. Genton, Matthias Katzfuss

Figure 1 for Scalable Gaussian-process regression and variable selection using Vecchia approximations
Figure 2 for Scalable Gaussian-process regression and variable selection using Vecchia approximations
Figure 3 for Scalable Gaussian-process regression and variable selection using Vecchia approximations
Figure 4 for Scalable Gaussian-process regression and variable selection using Vecchia approximations

Gaussian process (GP) regression is a flexible, nonparametric approach to regression that naturally quantifies uncertainty. In many applications, the number of responses and covariates are both large, and a goal is to select covariates that are related to the response. For this setting, we propose a novel, scalable algorithm, coined VGPR, which optimizes a penalized GP log-likelihood based on the Vecchia GP approximation, an ordered conditional approximation from spatial statistics that implies a sparse Cholesky factor of the precision matrix. We traverse the regularization path from strong to weak penalization, sequentially adding candidate covariates based on the gradient of the log-likelihood and deselecting irrelevant covariates via a new quadratic constrained coordinate descent algorithm. We propose Vecchia-based mini-batch subsampling, which provides unbiased gradient estimators. The resulting procedure is scalable to millions of responses and thousands of covariates. Theoretical analysis and numerical studies demonstrate the improved scalability and accuracy relative to existing methods.

* 28 pages, 8 figures 
Viaarxiv icon

Scaled Vecchia approximation for fast computer-model emulation

May 29, 2020
Matthias Katzfuss, Joseph Guinness, Earl Lawrence

Figure 1 for Scaled Vecchia approximation for fast computer-model emulation
Figure 2 for Scaled Vecchia approximation for fast computer-model emulation
Figure 3 for Scaled Vecchia approximation for fast computer-model emulation
Figure 4 for Scaled Vecchia approximation for fast computer-model emulation

Many scientific phenomena are studied using computer experiments consisting of multiple runs of a computer model while varying the input settings. Gaussian processes (GPs) are a popular tool for the analysis of computer experiments, enabling interpolation between input settings, but direct GP inference is computationally infeasible for large datasets. We adapt and extend a powerful class of GP methods from spatial statistics to enable the scalable analysis and emulation of large computer experiments. Specifically, we apply Vecchia's ordered conditional approximation in a transformed input space, with each input scaled according to how strongly it relates to the computer-model response. The scaling is learned from the data, by estimating parameters in the GP covariance function using Fisher scoring. Our methods are highly scalable, enabling estimation, joint prediction and simulation in near-linear time in the number of model runs. In several numerical examples, our approach substantially outperformed existing methods.

* R code available at https://github.com/katzfuss-group/scaledVecchia 
Viaarxiv icon