Alert button
Picture for Yian Ma

Yian Ma

Alert button

Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?

Jul 27, 2023
Kyurae Kim, Yian Ma, Jacob R. Gardner

Figure 1 for Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?
Figure 2 for Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?

We prove that black-box variational inference (BBVI) with control variates, particularly the sticking-the-landing (STL) estimator, converges at a geometric (traditionally called "linear") rate under perfect variational family specification. In particular, we prove a quadratic bound on the gradient variance of the STL estimator, one which encompasses misspecified variational families. Combined with previous works on the quadratic variance condition, this directly implies convergence of BBVI with the use of projected stochastic gradient descent. We also improve existing analysis on the regular closed-form entropy gradient estimators, which enables comparison against the STL estimator and provides explicit non-asymptotic complexity guarantees for both.

Viaarxiv icon

Monte Carlo Sampling without Isoperimetry: A Reverse Diffusion Approach

Jul 05, 2023
Xunpeng Huang, Hanze Dong, Yifan Hao, Yian Ma, Tong Zhang

Figure 1 for Monte Carlo Sampling without Isoperimetry: A Reverse Diffusion Approach
Figure 2 for Monte Carlo Sampling without Isoperimetry: A Reverse Diffusion Approach
Figure 3 for Monte Carlo Sampling without Isoperimetry: A Reverse Diffusion Approach
Figure 4 for Monte Carlo Sampling without Isoperimetry: A Reverse Diffusion Approach

The efficacy of modern generative models is commonly contingent upon the precision of score estimation along the diffusion path, with a focus on diffusion models and their ability to generate high-quality data samples. This study delves into the potentialities of posterior sampling through reverse diffusion. An examination of the sampling literature reveals that score estimation can be transformed into a mean estimation problem via the decomposition of the transition kernel. By estimating the mean of the auxiliary distribution, the reverse diffusion process can give rise to a novel posterior sampling algorithm, which diverges from traditional gradient-based Markov Chain Monte Carlo (MCMC) methods. We provide the convergence analysis in total variation distance and demonstrate that the isoperimetric dependency of the proposed algorithm is comparatively lower than that observed in conventional MCMC techniques, which justifies the superior performance for high dimensional sampling with error tolerance. Our analytical framework offers fresh perspectives on the complexity of score estimation at various time points, as denoted by the properties of the auxiliary distribution.

Viaarxiv icon

Black-Box Variational Inference Converges

May 24, 2023
Kyurae Kim, Kaiwen Wu, Jisu Oh, Yian Ma, Jacob R. Gardner

Figure 1 for Black-Box Variational Inference Converges
Figure 2 for Black-Box Variational Inference Converges
Figure 3 for Black-Box Variational Inference Converges
Figure 4 for Black-Box Variational Inference Converges

We provide the first convergence guarantee for full black-box variational inference (BBVI), also known as Monte Carlo variational inference. While preliminary investigations worked on simplified versions of BBVI (e.g., bounded domain, bounded support, only optimizing for the scale, and such), our setup does not need any such algorithmic modifications. Our results hold for log-smooth posterior densities with and without strong log-concavity and the location-scale variational family. Also, our analysis reveals that certain algorithm design choices commonly employed in practice, particularly, nonlinear parameterizations of the scale of the variational approximation, can result in suboptimal convergence rates. Fortunately, running BBVI with proximal stochastic gradient descent fixes these limitations, and thus achieves the strongest known convergence rate guarantees. We evaluate this theoretical insight by comparing proximal SGD against other standard implementations of BBVI on large-scale Bayesian inference problems.

* under review 
Viaarxiv icon

Disentangled Multi-Fidelity Deep Bayesian Active Learning

May 07, 2023
Dongxia Wu, Ruijia Niu, Matteo Chinazzi, Yian Ma, Rose Yu

Figure 1 for Disentangled Multi-Fidelity Deep Bayesian Active Learning
Figure 2 for Disentangled Multi-Fidelity Deep Bayesian Active Learning
Figure 3 for Disentangled Multi-Fidelity Deep Bayesian Active Learning
Figure 4 for Disentangled Multi-Fidelity Deep Bayesian Active Learning

To balance quality and cost, various domain areas of science and engineering run simulations at multiple levels of sophistication. Multi-fidelity active learning aims to learn a direct mapping from input parameters to simulation outputs by actively acquiring data from multiple fidelity levels. However, existing approaches based on Gaussian processes are hardly scalable to high-dimensional data. Other deep learning-based methods use the hierarchical structure, which only supports passing information from low-fidelity to high-fidelity. This approach also leads to the undesirable propagation of errors from low-fidelity representations to high-fidelity ones. We propose a novel disentangled deep Bayesian learning framework for multi-fidelity active learning, that learns the surrogate models conditioned on the distribution of functions at multiple fidelities.

Viaarxiv icon

On Optimal Early Stopping: Over-informative versus Under-informative Parametrization

Feb 20, 2022
Ruoqi Shen, Liyao Gao, Yian Ma

Figure 1 for On Optimal Early Stopping: Over-informative versus Under-informative Parametrization
Figure 2 for On Optimal Early Stopping: Over-informative versus Under-informative Parametrization
Figure 3 for On Optimal Early Stopping: Over-informative versus Under-informative Parametrization
Figure 4 for On Optimal Early Stopping: Over-informative versus Under-informative Parametrization

Early stopping is a simple and widely used method to prevent over-training neural networks. We develop theoretical results to reveal the relationship between the optimal early stopping time and model dimension as well as sample size of the dataset for certain linear models. Our results demonstrate two very different behaviors when the model dimension exceeds the number of features versus the opposite scenario. While most previous works on linear models focus on the latter setting, we observe that the dimension of the model often exceeds the number of features arising from data in common deep learning tasks and propose a model to study this setting. We demonstrate experimentally that our theoretical results on optimal early stopping time corresponds to the training process of deep neural networks.

* 30 pages, 15 figures 
Viaarxiv icon

Variational Refinement for Importance Sampling Using the Forward Kullback-Leibler Divergence

Jun 30, 2021
Ghassen Jerfel, Serena Wang, Clara Fannjiang, Katherine A. Heller, Yian Ma, Michael I. Jordan

Figure 1 for Variational Refinement for Importance Sampling Using the Forward Kullback-Leibler Divergence
Figure 2 for Variational Refinement for Importance Sampling Using the Forward Kullback-Leibler Divergence
Figure 3 for Variational Refinement for Importance Sampling Using the Forward Kullback-Leibler Divergence
Figure 4 for Variational Refinement for Importance Sampling Using the Forward Kullback-Leibler Divergence

Variational Inference (VI) is a popular alternative to asymptotically exact sampling in Bayesian inference. Its main workhorse is optimization over a reverse Kullback-Leibler divergence (RKL), which typically underestimates the tail of the posterior leading to miscalibration and potential degeneracy. Importance sampling (IS), on the other hand, is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures. The quality of IS crucially depends on the choice of the proposal distribution. Ideally, the proposal distribution has heavier tails than the target, which is rarely achievable by minimizing the RKL. We thus propose a novel combination of optimization and sampling techniques for approximate Bayesian inference by constructing an IS proposal distribution through the minimization of a forward KL (FKL) divergence. This approach guarantees asymptotic consistency and a fast convergence towards both the optimal IS estimator and the optimal variational approximation. We empirically demonstrate on real data that our method is competitive with variational boosting and MCMC.

* Accepted for the 37th Conference on Uncertainty in Artificial Intelligence (UAI 2021) 
Viaarxiv icon

DeepGLEAM: a hybrid mechanistic and deep learning model for COVID-19 forecasting

Feb 15, 2021
Dongxia Wu, Liyao Gao, Xinyue Xiong, Matteo Chinazzi, Alessandro Vespignani, Yian Ma, Rose Yu

Figure 1 for DeepGLEAM: a hybrid mechanistic and deep learning model for COVID-19 forecasting
Figure 2 for DeepGLEAM: a hybrid mechanistic and deep learning model for COVID-19 forecasting
Figure 3 for DeepGLEAM: a hybrid mechanistic and deep learning model for COVID-19 forecasting
Figure 4 for DeepGLEAM: a hybrid mechanistic and deep learning model for COVID-19 forecasting

We introduce DeepGLEAM, a hybrid model for COVID-19 forecasting. DeepGLEAM combines a mechanistic stochastic simulation model GLEAM with deep learning. It uses deep learning to learn the correction terms from GLEAM, which leads to improved performance. We further integrate various uncertainty quantification methods to generate confidence intervals. We demonstrate DeepGLEAM on real-world COVID-19 mortality forecasting tasks.

Viaarxiv icon

DeepGLEAM: an hybrid mechanistic and deep learning model for COVID-19 forecasting

Feb 12, 2021
Dongxia Wu, Liyao Gao, Xinyue Xiong, Matteo Chinazzi, Alessandro Vespignani, Yian Ma, Rose Yu

Figure 1 for DeepGLEAM: an hybrid mechanistic and deep learning model for COVID-19 forecasting
Figure 2 for DeepGLEAM: an hybrid mechanistic and deep learning model for COVID-19 forecasting
Figure 3 for DeepGLEAM: an hybrid mechanistic and deep learning model for COVID-19 forecasting
Figure 4 for DeepGLEAM: an hybrid mechanistic and deep learning model for COVID-19 forecasting

We introduce DeepGLEAM, a hybrid model for COVID-19 forecasting. DeepGLEAM combines a mechanistic stochastic simulation model GLEAM with deep learning. It uses deep learning to learn the correction terms from GLEAM, which leads to improved performance. We further integrate various uncertainty quantification methods to generate confidence intervals. We demonstrate DeepGLEAM on real-world COVID-19 mortality forecasting tasks.

Viaarxiv icon

Underspecification Presents Challenges for Credibility in Modern Machine Learning

Nov 06, 2020
Alexander D'Amour, Katherine Heller, Dan Moldovan, Ben Adlam, Babak Alipanahi, Alex Beutel, Christina Chen, Jonathan Deaton, Jacob Eisenstein, Matthew D. Hoffman, Farhad Hormozdiari, Neil Houlsby, Shaobo Hou, Ghassen Jerfel, Alan Karthikesalingam, Mario Lucic, Yian Ma, Cory McLean, Diana Mincu, Akinori Mitani, Andrea Montanari, Zachary Nado, Vivek Natarajan, Christopher Nielson, Thomas F. Osborne, Rajiv Raman, Kim Ramasamy, Rory Sayres, Jessica Schrouff, Martin Seneviratne, Shannon Sequeira, Harini Suresh, Victor Veitch, Max Vladymyrov, Xuezhi Wang, Kellie Webster, Steve Yadlowsky, Taedong Yun, Xiaohua Zhai, D. Sculley

Figure 1 for Underspecification Presents Challenges for Credibility in Modern Machine Learning
Figure 2 for Underspecification Presents Challenges for Credibility in Modern Machine Learning
Figure 3 for Underspecification Presents Challenges for Credibility in Modern Machine Learning
Figure 4 for Underspecification Presents Challenges for Credibility in Modern Machine Learning

ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures. An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predictors returned by underspecified pipelines are often treated as equivalent based on their training domain performance, but we show here that such predictors can behave very differently in deployment domains. This ambiguity can lead to instability and poor model behavior in practice, and is a distinct failure mode from previously identified issues arising from structural mismatch between training and deployment domains. We show that this problem appears in a wide variety of practical ML pipelines, using examples from computer vision, medical imaging, natural language processing, clinical risk prediction based on electronic health records, and medical genomics. Our results show the need to explicitly account for underspecification in modeling pipelines that are intended for real-world deployment in any domain.

Viaarxiv icon