Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Eytan Bakshy

Empirical Gaussian Processes

Feb 12, 2026

Jihao Andreas Lin, Sebastian Ament, Louis C. Tiao, David Eriksson, Maximilian Balandat, Eytan Bakshy

Abstract:Gaussian processes (GPs) are powerful and widely used probabilistic regression models, but their effectiveness in practice is often limited by the choice of kernel function. This kernel function is typically handcrafted from a small set of standard functions, a process that requires expert knowledge, results in limited adaptivity to data, and imposes strong assumptions on the hypothesis space. We study Empirical GPs, a principled framework for constructing flexible, data-driven GP priors that overcome these limitations. Rather than relying on standard parametric kernels, we estimate the mean and covariance functions empirically from a corpus of historical observations, enabling the prior to reflect rich, non-trivial covariance structures present in the data. Theoretically, we show that the resulting model converges to the GP that is closest (in KL-divergence sense) to the real data generating process. Practically, we formulate the problem of learning the GP prior from independent datasets as likelihood estimation and derive an Expectation-Maximization algorithm with closed-form updates, allowing the model handle heterogeneous observation locations across datasets. We demonstrate that Empirical GPs achieve competitive performance on learning curve extrapolation and time series forecasting benchmarks.

Via

Access Paper or Ask Questions

BONSAI: Bayesian Optimization with Natural Simplicity and Interpretability

Feb 06, 2026

Samuel Daulton, David Eriksson, Maximilian Balandat, Eytan Bakshy

Abstract:Bayesian optimization (BO) is a popular technique for sample-efficient optimization of black-box functions. In many applications, the parameters being tuned come with a carefully engineered default configuration, and practitioners only want to deviate from this default when necessary. Standard BO, however, does not aim to minimize deviation from the default and, in practice, often pushes weakly relevant parameters to the boundary of the search space. This makes it difficult to distinguish between important and spurious changes and increases the burden of vetting recommendations when the optimization objective omits relevant operational considerations. We introduce BONSAI, a default-aware BO policy that prunes low-impact deviations from a default configuration while explicitly controlling the loss in acquisition value. BONSAI is compatible with a variety of acquisition functions, including expected improvement and upper confidence bound (GP-UCB). We theoretically bound the regret incurred by BONSAI, showing that, under certain conditions, it enjoys the same no-regret property as vanilla GP-UCB. Across many real-world applications, we empirically find that BONSAI substantially reduces the number of non-default parameters in recommended configurations while maintaining competitive optimization performance, with little effect on wall time.

* 26 pages

Via

Access Paper or Ask Questions

Scalable Gaussian Processes with Latent Kronecker Structure

Jun 07, 2025

Jihao Andreas Lin, Sebastian Ament, Maximilian Balandat, David Eriksson, José Miguel Hernández-Lobato, Eytan Bakshy

Figure 1 for Scalable Gaussian Processes with Latent Kronecker Structure

Figure 2 for Scalable Gaussian Processes with Latent Kronecker Structure

Figure 3 for Scalable Gaussian Processes with Latent Kronecker Structure

Figure 4 for Scalable Gaussian Processes with Latent Kronecker Structure

Abstract:Applying Gaussian processes (GPs) to very large datasets remains a challenge due to limited computational scalability. Matrix structures, such as the Kronecker product, can accelerate operations significantly, but their application commonly entails approximations or unrealistic assumptions. In particular, the most common path to creating a Kronecker-structured kernel matrix is by evaluating a product kernel on gridded inputs that can be expressed as a Cartesian product. However, this structure is lost if any observation is missing, breaking the Cartesian product structure, which frequently occurs in real-world data such as time series. To address this limitation, we propose leveraging latent Kronecker structure, by expressing the kernel matrix of observed values as the projection of a latent Kronecker product. In combination with iterative linear system solvers and pathwise conditioning, our method facilitates inference of exact GPs while requiring substantially fewer computational resources than standard iterative methods. We demonstrate that our method outperforms state-of-the-art sparse and variational GPs on real-world datasets with up to five million examples, including robotics, automated machine learning, and climate applications.

* International Conference on Machine Learning 2025

Via

Access Paper or Ask Questions

Robust Gaussian Processes via Relevance Pursuit

Oct 31, 2024

Sebastian Ament, Elizabeth Santorella, David Eriksson, Ben Letham, Maximilian Balandat, Eytan Bakshy

Figure 1 for Robust Gaussian Processes via Relevance Pursuit

Figure 2 for Robust Gaussian Processes via Relevance Pursuit

Figure 3 for Robust Gaussian Processes via Relevance Pursuit

Figure 4 for Robust Gaussian Processes via Relevance Pursuit

Abstract:Gaussian processes (GPs) are non-parametric probabilistic regression models that are popular due to their flexibility, data efficiency, and well-calibrated uncertainty estimates. However, standard GP models assume homoskedastic Gaussian noise, while many real-world applications are subject to non-Gaussian corruptions. Variants of GPs that are more robust to alternative noise models have been proposed, and entail significant trade-offs between accuracy and robustness, and between computational requirements and theoretical guarantees. In this work, we propose and study a GP model that achieves robustness against sparse outliers by inferring data-point-specific noise levels with a sequential selection procedure maximizing the log marginal likelihood that we refer to as relevance pursuit. We show, surprisingly, that the model can be parameterized such that the associated log marginal likelihood is strongly concave in the data-point-specific noise variances, a property rarely found in either robust regression objectives or GP marginal likelihoods. This in turn implies the weak submodularity of the corresponding subset selection problem, and thereby proves approximation guarantees for the proposed algorithm. We compare the model's performance relative to other approaches on diverse regression and Bayesian optimization tasks, including the challenging but common setting of sparse corruptions of the labels within or close to the function range.

* NeurIPS 2024 Article

Via

Access Paper or Ask Questions

Scaling Gaussian Processes for Learning Curve Prediction via Latent Kronecker Structure

Oct 11, 2024

Jihao Andreas Lin, Sebastian Ament, Maximilian Balandat, Eytan Bakshy

Figure 1 for Scaling Gaussian Processes for Learning Curve Prediction via Latent Kronecker Structure

Figure 2 for Scaling Gaussian Processes for Learning Curve Prediction via Latent Kronecker Structure

Figure 3 for Scaling Gaussian Processes for Learning Curve Prediction via Latent Kronecker Structure

Figure 4 for Scaling Gaussian Processes for Learning Curve Prediction via Latent Kronecker Structure

Abstract:A key task in AutoML is to model learning curves of machine learning models jointly as a function of model hyper-parameters and training progression. While Gaussian processes (GPs) are suitable for this task, na\"ive GPs require $\mathcal{O}(n^3m^3)$ time and $\mathcal{O}(n^2 m^2)$ space for $n$ hyper-parameter configurations and $\mathcal{O}(m)$ learning curve observations per hyper-parameter. Efficient inference via Kronecker structure is typically incompatible with early-stopping due to missing learning curve values. We impose $\textit{latent Kronecker structure}$ to leverage efficient product kernels while handling missing values. In particular, we interpret the joint covariance matrix of observed values as the projection of a latent Kronecker product. Combined with iterative linear solvers and structured matrix-vector multiplication, our method only requires $\mathcal{O}(n^3 + m^3)$ time and $\mathcal{O}(n^2 + m^2)$ space. We show that our GP model can match the performance of a Transformer on a learning curve prediction task.

* Bayesian Decision-making and Uncertainty Workshop at NeurIPS 2024

Via

Access Paper or Ask Questions

Active Learning for Derivative-Based Global Sensitivity Analysis with Gaussian Processes

Jul 13, 2024

Syrine Belakaria, Benjamin Letham, Janardhan Rao Doppa, Barbara Engelhardt, Stefano Ermon, Eytan Bakshy

Figure 1 for Active Learning for Derivative-Based Global Sensitivity Analysis with Gaussian Processes

Figure 2 for Active Learning for Derivative-Based Global Sensitivity Analysis with Gaussian Processes

Figure 3 for Active Learning for Derivative-Based Global Sensitivity Analysis with Gaussian Processes

Figure 4 for Active Learning for Derivative-Based Global Sensitivity Analysis with Gaussian Processes

Abstract:We consider the problem of active learning for global sensitivity analysis of expensive black-box functions. Our aim is to efficiently learn the importance of different input variables, e.g., in vehicle safety experimentation, we study the impact of the thickness of various components on safety objectives. Since function evaluations are expensive, we use active learning to prioritize experimental resources where they yield the most value. We propose novel active learning acquisition functions that directly target key quantities of derivative-based global sensitivity measures (DGSMs) under Gaussian process surrogate models. We showcase the first application of active learning directly to DGSMs, and develop tractable uncertainty reduction and information gain acquisition functions for these measures. Through comprehensive evaluation on synthetic and real-world problems, our study demonstrates how these active learning acquisition strategies substantially enhance the sample efficiency of DGSM estimation, particularly with limited evaluation budgets. Our work paves the way for more efficient and accurate sensitivity analysis in various scientific and engineering applications.

Via

Access Paper or Ask Questions

Joint Composite Latent Space Bayesian Optimization

Nov 03, 2023

Natalie Maus, Zhiyuan Jerry Lin, Maximilian Balandat, Eytan Bakshy

Figure 1 for Joint Composite Latent Space Bayesian Optimization

Figure 2 for Joint Composite Latent Space Bayesian Optimization

Figure 3 for Joint Composite Latent Space Bayesian Optimization

Figure 4 for Joint Composite Latent Space Bayesian Optimization

Abstract:Bayesian Optimization (BO) is a technique for sample-efficient black-box optimization that employs probabilistic models to identify promising input locations for evaluation. When dealing with composite-structured functions, such as f=g o h, evaluating a specific location x yields observations of both the final outcome f(x) = g(h(x)) as well as the intermediate output(s) h(x). Previous research has shown that integrating information from these intermediate outputs can enhance BO performance substantially. However, existing methods struggle if the outputs h(x) are high-dimensional. Many relevant problems fall into this setting, including in the context of generative AI, molecular design, or robotics. To effectively tackle these challenges, we introduce Joint Composite Latent Space Bayesian Optimization (JoCo), a novel framework that jointly trains neural network encoders and probabilistic models to adaptively compress high-dimensional input and output spaces into manageable latent representations. This enables viable BO on these compressed representations, allowing JoCo to outperform other state-of-the-art methods in high-dimensional BO on a wide variety of simulated and real-world problems.

Via

Access Paper or Ask Questions

Unexpected Improvements to Expected Improvement for Bayesian Optimization

Oct 31, 2023

Sebastian Ament, Samuel Daulton, David Eriksson, Maximilian Balandat, Eytan Bakshy

Figure 1 for Unexpected Improvements to Expected Improvement for Bayesian Optimization

Figure 2 for Unexpected Improvements to Expected Improvement for Bayesian Optimization

Figure 3 for Unexpected Improvements to Expected Improvement for Bayesian Optimization

Figure 4 for Unexpected Improvements to Expected Improvement for Bayesian Optimization

Abstract:Expected Improvement (EI) is arguably the most popular acquisition function in Bayesian optimization and has found countless successful applications, but its performance is often exceeded by that of more recent methods. Notably, EI and its variants, including for the parallel and multi-objective settings, are challenging to optimize because their acquisition values vanish numerically in many regions. This difficulty generally increases as the number of observations, dimensionality of the search space, or the number of constraints grow, resulting in performance that is inconsistent across the literature and most often sub-optimal. Herein, we propose LogEI, a new family of acquisition functions whose members either have identical or approximately equal optima as their canonical counterparts, but are substantially easier to optimize numerically. We demonstrate that numerical pathologies manifest themselves in "classic" analytic EI, Expected Hypervolume Improvement (EHVI), as well as their constrained, noisy, and parallel variants, and propose corresponding reformulations that remedy these pathologies. Our empirical results show that members of the LogEI family of acquisition functions substantially improve on the optimization performance of their canonical counterparts and surprisingly, are on par with or exceed the performance of recent state-of-the-art acquisition functions, highlighting the understated role of numerical optimization in the literature.

* NeurIPS 2023 Spotlight

Via

Access Paper or Ask Questions

Practical Policy Optimization with Personalized Experimentation

Mar 30, 2023

Mia Garrard, Hanson Wang, Ben Letham, Shaun Singh, Abbas Kazerouni, Sarah Tan, Zehui Wang, Yin Huang, Yichun Hu, Chad Zhou(+2 more)

Figure 1 for Practical Policy Optimization with Personalized Experimentation

Figure 2 for Practical Policy Optimization with Personalized Experimentation

Abstract:Many organizations measure treatment effects via an experimentation platform to evaluate the casual effect of product variations prior to full-scale deployment. However, standard experimentation platforms do not perform optimally for end user populations that exhibit heterogeneous treatment effects (HTEs). Here we present a personalized experimentation framework, Personalized Experiments (PEX), which optimizes treatment group assignment at the user level via HTE modeling and sequential decision policy optimization to optimize multiple short-term and long-term outcomes simultaneously. We describe an end-to-end workflow that has proven to be successful in practice and can be readily implemented using open-source software.

* 5 pages, 2 figures

Via

Access Paper or Ask Questions

qEUBO: A Decision-Theoretic Acquisition Function for Preferential Bayesian Optimization

Mar 28, 2023

Raul Astudillo, Zhiyuan Jerry Lin, Eytan Bakshy, Peter I. Frazier

Abstract:Preferential Bayesian optimization (PBO) is a framework for optimizing a decision maker's latent utility function using preference feedback. This work introduces the expected utility of the best option (qEUBO) as a novel acquisition function for PBO. When the decision maker's responses are noise-free, we show that qEUBO is one-step Bayes optimal and thus equivalent to the popular knowledge gradient acquisition function. We also show that qEUBO enjoys an additive constant approximation guarantee to the one-step Bayes-optimal policy when the decision maker's responses are corrupted by noise. We provide an extensive evaluation of qEUBO and demonstrate that it outperforms the state-of-the-art acquisition functions for PBO across many settings. Finally, we show that, under sufficient regularity conditions, qEUBO's Bayesian simple regret converges to zero at a rate $o(1/n)$ as the number of queries, $n$, goes to infinity. In contrast, we show that simple regret under qEI, a popular acquisition function for standard BO often used for PBO, can fail to converge to zero. Enjoying superior performance, simple computation, and a grounded decision-theoretic justification, qEUBO is a promising acquisition function for PBO.

* In Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS) 2023

Via

Access Paper or Ask Questions