Alert button
Picture for Daniel Golovin

Daniel Golovin

Alert button

Richard

SmartChoices: Augmenting Software with Learned Implementations

Apr 12, 2023
Daniel Golovin, Gabor Bartok, Eric Chen, Emily Donahue, Tzu-Kuo Huang, Efi Kokiopoulou, Ruoyan Qin, Nikhil Sarda, Justin Sybrandt, Vincent Tjeng

Figure 1 for SmartChoices: Augmenting Software with Learned Implementations
Figure 2 for SmartChoices: Augmenting Software with Learned Implementations
Figure 3 for SmartChoices: Augmenting Software with Learned Implementations
Figure 4 for SmartChoices: Augmenting Software with Learned Implementations

We are living in a golden age of machine learning. Powerful models are being trained to perform many tasks far better than is possible using traditional software engineering approaches alone. However, developing and deploying those models in existing software systems remains difficult. In this paper we present SmartChoices, a novel approach to incorporating machine learning into mature software stacks easily, safely, and effectively. We explain the overall design philosophy and present case studies using SmartChoices within large scale industrial systems.

Viaarxiv icon

Open Source Vizier: Distributed Infrastructure and API for Reliable and Flexible Blackbox Optimization

Jul 27, 2022
Xingyou Song, Sagi Perel, Chansoo Lee, Greg Kochanski, Daniel Golovin

Figure 1 for Open Source Vizier: Distributed Infrastructure and API for Reliable and Flexible Blackbox Optimization
Figure 2 for Open Source Vizier: Distributed Infrastructure and API for Reliable and Flexible Blackbox Optimization
Figure 3 for Open Source Vizier: Distributed Infrastructure and API for Reliable and Flexible Blackbox Optimization
Figure 4 for Open Source Vizier: Distributed Infrastructure and API for Reliable and Flexible Blackbox Optimization

Vizier is the de-facto blackbox and hyperparameter optimization service across Google, having optimized some of Google's largest products and research efforts. To operate at the scale of tuning thousands of users' critical systems, Google Vizier solved key design challenges in providing multiple different features, while remaining fully fault-tolerant. In this paper, we introduce Open Source (OSS) Vizier, a standalone Python-based interface for blackbox optimization and research, based on the Google-internal Vizier infrastructure and framework. OSS Vizier provides an API capable of defining and solving a wide variety of optimization problems, including multi-metric, early stopping, transfer learning, and conditional search. Furthermore, it is designed to be a distributed system that assures reliability, and allows multiple parallel evaluations of the user's objective function. The flexible RPC-based infrastructure allows users to access OSS Vizier from binaries written in any language. OSS Vizier also provides a back-end ("Pythia") API that gives algorithm authors a way to interface new algorithms with the core OSS Vizier system. OSS Vizier is available at https://github.com/google/vizier.

* Published as a conference paper for the systems track at the 1st International Conference on Automated Machine Learning (AutoML-Conf 2022). Code can be found at https://github.com/google/vizier 
Viaarxiv icon

Random Hypervolume Scalarizations for Provable Multi-Objective Black Box Optimization

Jun 09, 2020
Daniel Golovin, Qiuyi Zhang

Figure 1 for Random Hypervolume Scalarizations for Provable Multi-Objective Black Box Optimization
Figure 2 for Random Hypervolume Scalarizations for Provable Multi-Objective Black Box Optimization

Single-objective black box optimization (also known as zeroth-order optimization) is the process of minimizing a scalar objective $f(x)$, given evaluations at adaptively chosen inputs $x$. In this paper, we consider multi-objective optimization, where $f(x)$ outputs a vector of possibly competing objectives and the goal is to converge to the Pareto frontier. Quantitatively, we wish to maximize the standard hypervolume indicator metric, which measures the dominated hypervolume of the entire set of chosen inputs. In this paper, we introduce a novel scalarization function, which we term the hypervolume scalarization, and show that drawing random scalarizations from an appropriately chosen distribution can be used to efficiently approximate the hypervolume indicator metric. We utilize this connection to show that Bayesian optimization with our scalarization via common acquisition functions, such as Thompson Sampling or Upper Confidence Bound, provably converges to the whole Pareto frontier by deriving tight hypervolume regret bounds on the order of $\widetilde{O}(\sqrt{T})$. Furthermore, we highlight the general utility of our scalarization framework by showing that any provably convergent single-objective optimization process can be effortlessly converted to a multi-objective optimization process with provable convergence guarantees.

* ICML 2020 
Viaarxiv icon

Gradientless Descent: High-Dimensional Zeroth-Order Optimization

Nov 19, 2019
Daniel Golovin, John Karro, Greg Kochanski, Chansoo Lee, Xingyou Song, Qiuyi Zhang

Figure 1 for Gradientless Descent: High-Dimensional Zeroth-Order Optimization
Figure 2 for Gradientless Descent: High-Dimensional Zeroth-Order Optimization
Figure 3 for Gradientless Descent: High-Dimensional Zeroth-Order Optimization
Figure 4 for Gradientless Descent: High-Dimensional Zeroth-Order Optimization

Zeroth-order optimization is the process of minimizing an objective $f(x)$, given oracle access to evaluations at adaptively chosen inputs $x$. In this paper, we present two simple yet powerful GradientLess Descent (GLD) algorithms that do not rely on an underlying gradient estimate and are numerically stable. We analyze our algorithm from a novel geometric perspective and present a novel analysis that shows convergence within an $\epsilon$-ball of the optimum in $O(kQ\log(n)\log(R/\epsilon))$ evaluations, for any monotone transform of a smooth and strongly convex objective with latent dimension $k < n$, where the input dimension is $n$, $R$ is the diameter of the input space and $Q$ is the condition number. Our rates are the first of its kind to be both 1) poly-logarithmically dependent on dimensionality and 2) invariant under monotone transformations. We further leverage our geometric perspective to show that our analysis is optimal. Both monotone invariance and its ability to utilize a low latent dimensionality are key to the empirical success of our algorithms, as demonstrated on BBOB and MuJoCo benchmarks.

* 11 main pages, 26 total pages 
Viaarxiv icon

Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization

Dec 06, 2017
Daniel Golovin, Andreas Krause

Figure 1 for Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization
Figure 2 for Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization
Figure 3 for Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization
Figure 4 for Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization

Solving stochastic optimization problems under partial observability, where one needs to adaptively make decisions with uncertain outcomes, is a fundamental but notoriously difficult challenge. In this paper, we introduce the concept of adaptive submodularity, generalizing submodular set functions to adaptive policies. We prove that if a problem satisfies this property, a simple adaptive greedy algorithm is guaranteed to be competitive with the optimal policy. In addition to providing performance guarantees for both stochastic maximization and coverage, adaptive submodularity can be exploited to drastically speed up the greedy algorithm by using lazy evaluations. We illustrate the usefulness of the concept by giving several examples of adaptive submodular objectives arising in diverse applications including sensor placement, viral marketing and active learning. Proving adaptive submodularity for these problems allows us to recover existing results in these applications as special cases, improve approximation guarantees and handle natural generalizations.

* 60 pages, 6 figures. Version 5 addresses a flaw in the proof of Theorem 13 identified by Nan and Saligrama (2017). The revision includes a weaker version of Theorem 13, guaranteeing squared logarithmic approximation under an additional strong adaptive submodularity condition. This condition is met by all applications considered in the paper, as discussed in the revised Sections 7, 8 and 9 
Viaarxiv icon

Online Submodular Maximization under a Matroid Constraint with Application to Learning Assignments

Jul 03, 2014
Daniel Golovin, Andreas Krause, Matthew Streeter

Figure 1 for Online Submodular Maximization under a Matroid Constraint with Application to Learning Assignments

Which ads should we display in sponsored search in order to maximize our revenue? How should we dynamically rank information sources to maximize the value of the ranking? These applications exhibit strong diminishing returns: Redundancy decreases the marginal utility of each ad or information source. We show that these and other problems can be formalized as repeatedly selecting an assignment of items to positions to maximize a sequence of monotone submodular functions that arrive one by one. We present an efficient algorithm for this general problem and analyze it in the no-regret model. Our algorithm possesses strong theoretical guarantees, such as a performance ratio that converges to the optimal constant of 1 - 1/e. We empirically evaluate our algorithm on two real-world online optimization problems on the web: ad allocation with submodular utilities, and dynamically ranking blogs to detect information cascades. Finally, we present a second algorithm that handles the more general case in which the feasible sets are given by a matroid constraint, while still maintaining a 1 - 1/e asymptotic performance ratio.

* 20 pages 
Viaarxiv icon

Near-Optimal Bayesian Active Learning with Noisy Observations

Dec 16, 2013
Daniel Golovin, Andreas Krause, Debajyoti Ray

Figure 1 for Near-Optimal Bayesian Active Learning with Noisy Observations
Figure 2 for Near-Optimal Bayesian Active Learning with Noisy Observations

We tackle the fundamental problem of Bayesian active learning with noise, where we need to adaptively select from a number of expensive tests in order to identify an unknown hypothesis sampled from a known prior distribution. In the case of noise-free observations, a greedy algorithm called generalized binary search (GBS) is known to perform near-optimally. We show that if the observations are noisy, perhaps surprisingly, GBS can perform very poorly. We develop EC2, a novel, greedy active learning algorithm and prove that it is competitive with the optimal policy, thus obtaining the first competitiveness guarantees for Bayesian active learning with noisy observations. Our bounds rely on a recently discovered diminishing returns property called adaptive submodularity, generalizing the classical notion of submodular set functions to adaptive policies. Our results hold even if the tests have non-uniform cost and their noise is correlated. We also propose EffECXtive, a particularly fast approximation of EC2, and evaluate it on a Bayesian experimental design problem involving human subjects, intended to tease apart competing economic theories of how people make decisions under uncertainty.

* 15 pages. Version 2 contains only one major change, namely an amended proof of Lemma 6 
Viaarxiv icon