Alert button
Picture for Shali Jiang

Shali Jiang

Alert button

Efficient Nonmyopic Bayesian Optimization via One-Shot Multi-Step Trees

Jun 29, 2020
Shali Jiang, Daniel R. Jiang, Maximilian Balandat, Brian Karrer, Jacob R. Gardner, Roman Garnett

Figure 1 for Efficient Nonmyopic Bayesian Optimization via One-Shot Multi-Step Trees
Figure 2 for Efficient Nonmyopic Bayesian Optimization via One-Shot Multi-Step Trees
Figure 3 for Efficient Nonmyopic Bayesian Optimization via One-Shot Multi-Step Trees
Figure 4 for Efficient Nonmyopic Bayesian Optimization via One-Shot Multi-Step Trees

Bayesian optimization is a sequential decision making framework for optimizing expensive-to-evaluate black-box functions. Computing a full lookahead policy amounts to solving a highly intractable stochastic dynamic program. Myopic approaches, such as expected improvement, are often adopted in practice, but they ignore the long-term impact of the immediate decision. Existing nonmyopic approaches are mostly heuristic and/or computationally expensive. In this paper, we provide the first efficient implementation of general multi-step lookahead Bayesian optimization, formulated as a sequence of nested optimization problems within a multi-step scenario tree. Instead of solving these problems in a nested way, we equivalently optimize all decision variables in the full tree jointly, in a ``one-shot'' fashion. Combining this with an efficient method for implementing multi-step Gaussian process ``fantasization,'' we demonstrate that multi-step expected improvement is computationally tractable and exhibits performance superior to existing methods on a wide range of benchmarks.

Viaarxiv icon

Nearly Optimal Risk Bounds for Kernel K-Means

Mar 09, 2020
Yong Liu, Lizhong Ding, Hua Zhang, Wenqi Ren, Xiao Zhang, Shali Jiang, Xinwang Liu, Weiping Wang

In this paper, we study the statistical properties of the kernel $k$-means and obtain a nearly optimal excess risk bound, substantially improving the state-of-art bounds in the existing clustering risk analyses. We further analyze the statistical effect of computational approximations of the Nystr\"{o}m kernel $k$-means, and demonstrate that it achieves the same statistical accuracy as the exact kernel $k$-means considering only $\sqrt{nk}$ Nystr\"{o}m landmark points. To the best of our knowledge, such sharp excess risk bounds for kernel (or approximate kernel) $k$-means have never been seen before.

Viaarxiv icon

Efficient nonmyopic Bayesian optimization and quadrature

Oct 16, 2019
Shali Jiang, Henry Chai, Javier Gonzalez, Roman Garnett

Figure 1 for Efficient nonmyopic Bayesian optimization and quadrature
Figure 2 for Efficient nonmyopic Bayesian optimization and quadrature
Figure 3 for Efficient nonmyopic Bayesian optimization and quadrature
Figure 4 for Efficient nonmyopic Bayesian optimization and quadrature

Finite-horizon sequential decision problems arise naturally in many machine learning contexts, including Bayesian optimization and Bayesian quadrature. Computing the optimal policy for such problems requires solving Bellman equations, which are generally intractable. Most existing work resorts to myopic approximations by limiting the decision horizon to only a single time-step, which can perform poorly at balancing exploration and exploitation. We propose a general framework for efficient, nonmyopic approximation of the optimal policy by drawing a connection between the optimal adaptive policy and its non-adaptive counterpart. Our proposal is to compute an optimal batch of points, then select a single point from within this batch to evaluate. We realize this idea for both Bayesian optimization and Bayesian quadrature and demonstrate that our proposed method significantly outperforms common myopic alternatives on a variety of tasks.

* 13 pages, 4 figures, 6 tables, 1 algorithm 
Viaarxiv icon

D-VAE: A Variational Autoencoder for Directed Acyclic Graphs

May 30, 2019
Muhan Zhang, Shali Jiang, Zhicheng Cui, Roman Garnett, Yixin Chen

Figure 1 for D-VAE: A Variational Autoencoder for Directed Acyclic Graphs
Figure 2 for D-VAE: A Variational Autoencoder for Directed Acyclic Graphs
Figure 3 for D-VAE: A Variational Autoencoder for Directed Acyclic Graphs
Figure 4 for D-VAE: A Variational Autoencoder for Directed Acyclic Graphs

Graph structured data are abundant in the real world. Among different graph types, directed acyclic graphs (DAGs) are of particular interest to machine learning researchers, as many machine learning models are realized as computations on DAGs, including neural networks and Bayesian networks. In this paper, we study deep generative models for DAGs, and propose a novel DAG variational autoencoder (D-VAE). To encode DAGs into the latent space, we leverage graph neural networks. We propose an asynchronous message passing scheme that allows encoding the computations on DAGs, rather than using existing simultaneous message passing schemes to encode local graph structures. We demonstrate the effectiveness of our proposed D-VAE through two tasks: neural architecture search and Bayesian network structure learning. Experiments show that our model not only generates novel and valid DAGs, but also produces a smooth latent space that facilitates searching for DAGs with better performance through Bayesian optimization.

Viaarxiv icon

Efficient nonmyopic active search with applications in drug and materials discovery

Nov 23, 2018
Shali Jiang, Gustavo Malkomes, Benjamin Moseley, Roman Garnett

Figure 1 for Efficient nonmyopic active search with applications in drug and materials discovery
Figure 2 for Efficient nonmyopic active search with applications in drug and materials discovery
Figure 3 for Efficient nonmyopic active search with applications in drug and materials discovery
Figure 4 for Efficient nonmyopic active search with applications in drug and materials discovery

Active search is a learning paradigm for actively identifying as many members of a given class as possible. A critical target scenario is high-throughput screening for scientific discovery, such as drug or materials discovery. In this paper, we approach this problem in Bayesian decision framework. We first derive the Bayesian optimal policy under a natural utility, and establish a theoretical hardness of active search, proving that the optimal policy can not be approximated for any constant ratio. We also study the batch setting for the first time, where a batch of $b>1$ points can be queried at each iteration. We give an asymptotic lower bound, linear in batch size, on the adaptivity gap: how much we could lose if we query $b$ points at a time for $t$ iterations, instead of one point at a time for $bt$ iterations. We then introduce a novel approach to nonmyopic approximations of the optimal policy that admits efficient computation. Our proposed policy can automatically trade off exploration and exploitation, without relying on any tuning parameters. We also generalize our policy to batch setting, and propose two approaches to tackle the combinatorial search challenge. We evaluate our proposed policies on a large database of drug discovery and materials science. Results demonstrate the superior performance of our proposed policy in both sequential and batch setting; the nonmyopic behavior is also illustrated in various aspects.

* Machine Learning for Molecules and Materials (NeurIPS 2018 Workshop) 
Viaarxiv icon