Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Guang Cheng

Purdue

Nearly Optimal Variational Inference for High Dimensional Regression with Shrinkage Priors

Oct 24, 2020

Jincheng Bai, Qifan Song, Guang Cheng

Figure 1 for Nearly Optimal Variational Inference for High Dimensional Regression with Shrinkage Priors

Figure 2 for Nearly Optimal Variational Inference for High Dimensional Regression with Shrinkage Priors

Figure 3 for Nearly Optimal Variational Inference for High Dimensional Regression with Shrinkage Priors

Abstract:We propose a variational Bayesian (VB) procedure for high-dimensional linear model inferences with heavy tail shrinkage priors, such as student-t prior. Theoretically, we establish the consistency of the proposed VB method and prove that under the proper choice of prior specifications, the contraction rate of the VB posterior is nearly optimal. It justifies the validity of VB inference as an alternative of Markov Chain Monte Carlo (MCMC) sampling. Meanwhile, comparing to conventional MCMC methods, the VB procedure achieves much higher computational efficiency, which greatly alleviates the computing burden for modern machine learning applications such as massive data analysis. Through numerical studies, we demonstrate that the proposed VB method leads to shorter computing time, higher estimation accuracy, and lower variable selection error than competitive sparse Bayesian methods.

Via

Access Paper or Ask Questions

On the Generalization Properties of Adversarial Training

Aug 15, 2020

Yue Xing, Qifan Song, Guang Cheng

Figure 1 for On the Generalization Properties of Adversarial Training

Figure 2 for On the Generalization Properties of Adversarial Training

Figure 3 for On the Generalization Properties of Adversarial Training

Figure 4 for On the Generalization Properties of Adversarial Training

Abstract:Modern machine learning and deep learning models are shown to be vulnerable when testing data are slightly perturbed. Theoretical studies of adversarial training algorithms mostly focus on their adversarial training losses or local convergence properties. In contrast, this paper studies the generalization performance of a generic adversarial training algorithm. Specifically, we consider linear regression models and two-layer neural networks (with lazy training) using squared loss under both low-dimensional and high-dimensional regimes. In the former regime, the adversarial risk of the trained models will converge to the minimal adversarial risk. In the latter regime, we discover that data interpolation prevents the adversarial robust estimator from being consistent (i.e. converge in probability). Therefore, inspired by successes of the least absolute shrinkage and selection operator (LASSO), we incorporate the L1 penalty in the high dimensional adversarial learning, and show that it leads to consistent adversarial robust estimation in both theory and numerical trials.

Via

Access Paper or Ask Questions

Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

Jul 06, 2020

Wenjia Wang, Tianyang Hu, Cong Lin, Guang Cheng

Figure 1 for Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

Figure 2 for Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

Figure 3 for Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

Figure 4 for Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

Abstract:Overparametrized neural networks trained by gradient descent (GD) can provably overfit any training data. However, the generalization guarantee may not hold for noisy data. From a nonparametric perspective, this paper studies how well overparametrized neural networks can recover the true target function in the presence of random noises. We establish a lower bound on the $L_2$ estimation error with respect to the GD iteration, which is away from zero without a delicate choice of early stopping. In turn, through a comprehensive analysis of $\ell_2$-regularized GD trajectories, we prove that for overparametrized one-hidden-layer ReLU neural network with the $\ell_2$ regularization: (1) the output is close to that of the kernel ridge regression with the corresponding neural tangent kernel; (2) minimax {optimal} rate of $L_2$ estimation error is achieved. Numerical experiments confirm our theory and further demonstrate that the $\ell_2$ regularization approach improves the training robustness and works for a wider range of neural networks.

Via

Access Paper or Ask Questions

Online Regularization for High-Dimensional Dynamic Pricing Algorithms

Jul 05, 2020

Chi-Hua Wang, Zhanyu Wang, Will Wei Sun, Guang Cheng

Figure 1 for Online Regularization for High-Dimensional Dynamic Pricing Algorithms

Figure 2 for Online Regularization for High-Dimensional Dynamic Pricing Algorithms

Figure 3 for Online Regularization for High-Dimensional Dynamic Pricing Algorithms

Figure 4 for Online Regularization for High-Dimensional Dynamic Pricing Algorithms

Abstract:We propose a novel \textit{online regularization} scheme for revenue-maximization in high-dimensional dynamic pricing algorithms. The online regularization scheme equips the proposed optimistic online regularized maximum likelihood pricing (\texttt{OORMLP}) algorithm with three major advantages: encode market noise knowledge into pricing process optimism; empower online statistical learning with always-validity over all decision points; envelop prediction error process with time-uniform non-asymptotic oracle inequalities. This type of non-asymptotic inference results allows us to design safer and more robust dynamic pricing algorithms in practice. In theory, the proposed \texttt{OORMLP} algorithm exploits the sparsity structure of high-dimensional models and obtains a logarithmic regret in a decision horizon. These theoretical advances are made possible by proposing an optimistic online LASSO procedure that resolves dynamic pricing problems at the \textit{process} level, based on a novel use of non-asymptotic martingale concentration. In experiments, we evaluate \texttt{OORMLP} in different synthetic pricing problem settings and observe that \texttt{OORMLP} performs better than \texttt{RMLP} proposed in \cite{javanmard2019dynamic}.

Via

Access Paper or Ask Questions

Directional Pruning of Deep Neural Networks

Jun 16, 2020

Shih-Kang Chao, Zhanyu Wang, Yue Xing, Guang Cheng

Figure 1 for Directional Pruning of Deep Neural Networks

Figure 2 for Directional Pruning of Deep Neural Networks

Figure 3 for Directional Pruning of Deep Neural Networks

Figure 4 for Directional Pruning of Deep Neural Networks

Abstract:In the light of the fact that the stochastic gradient descent (SGD) often finds a flat minimum valley in the training loss, we propose a novel directional pruning method which searches for a sparse minimizer in that flat region. The proposed pruning method is automatic in the sense that neither retraining nor expert knowledge is required. To overcome the computational formidability of estimating the flat directions, we propose to use a carefully tuned $\ell_1$ proximal gradient algorithm which can provably achieve the directional pruning with a small learning rate after sufficient training. The empirical results show that our algorithm performs competitively in highly sparse regime (92\% sparsity) among many existing automatic pruning methods on the ResNet50 with the ImageNet, while using only a slightly higher wall time and memory footprint than the SGD. Using the VGG16 and the wide ResNet 28x10 on the CIFAR-10 and CIFAR-100, we demonstrate that our algorithm reaches the same minima valley as the SGD, and the minima found by our algorithm and the SGD do not deviate in directions that impact the training loss.

* 29 pages

Via

Access Paper or Ask Questions

On Deep Instrumental Variables Estimate

Apr 30, 2020

Ruiqi Liu, Zuofeng Shang, Guang Cheng

Figure 1 for On Deep Instrumental Variables Estimate

Figure 2 for On Deep Instrumental Variables Estimate

Figure 3 for On Deep Instrumental Variables Estimate

Figure 4 for On Deep Instrumental Variables Estimate

Abstract:The endogeneity issue is fundamentally important as many empirical applications may suffer from the omission of explanatory variables, measurement error, or simultaneous causality. Recently, \cite{hllt17} propose a "Deep Instrumental Variable (IV)" framework based on deep neural networks to address endogeneity, demonstrating superior performances than existing approaches. The aim of this paper is to theoretically understand the empirical success of the Deep IV. Specifically, we consider a two-stage estimator using deep neural networks in the linear instrumental variables model. By imposing a latent structural assumption on the reduced form equation between endogenous variables and instrumental variables, the first-stage estimator can automatically capture this latent structure and converge to the optimal instruments at the minimax optimal rate, which is free of the dimension of instrumental variables and thus mitigates the curse of dimensionality. Additionally, in comparison with classical methods, due to the faster convergence rate of the first-stage estimator, the second-stage estimator has {a smaller (second order) estimation error} and requires a weaker condition on the smoothness of the optimal instruments. Given that the depth and width of the employed deep neural network are well chosen, we further show that the second-stage estimator achieves the semiparametric efficiency bound. Simulation studies on synthetic data and application to automobile market data confirm our theory.

Via

Access Paper or Ask Questions

Online Batch Decision-Making with High-Dimensional Covariates

Feb 27, 2020

Chi-Hua Wang, Guang Cheng

Figure 1 for Online Batch Decision-Making with High-Dimensional Covariates

Figure 2 for Online Batch Decision-Making with High-Dimensional Covariates

Figure 3 for Online Batch Decision-Making with High-Dimensional Covariates

Abstract:We propose and investigate a class of new algorithms for sequential decision making that interacts with \textit{a batch of users} simultaneously instead of \textit{a user} at each decision epoch. This type of batch models is motivated by interactive marketing and clinical trial, where a group of people are treated simultaneously and the outcomes of the whole group are collected before the next stage of decision. In such a scenario, our goal is to allocate a batch of treatments to maximize treatment efficacy based on observed high-dimensional user covariates. We deliver a solution, named \textit{Teamwork LASSO Bandit algorithm}, that resolves a batch version of explore-exploit dilemma via switching between teamwork stage and selfish stage during the whole decision process. This is made possible based on statistical properties of LASSO estimate of treatment efficacy that adapts to a sequence of batch observations. In general, a rate of optimal allocation condition is proposed to delineate the exploration and exploitation trade-off on the data collection scheme, which is sufficient for LASSO to identify the optimal treatment for observed user covariates. An upper bound on expected cumulative regret of the proposed algorithm is provided.

Via

Access Paper or Ask Questions

Simultaneous Inference for Massive Data: Distributed Bootstrap

Feb 19, 2020

Yang Yu, Shih-Kang Chao, Guang Cheng

Figure 1 for Simultaneous Inference for Massive Data: Distributed Bootstrap

Figure 2 for Simultaneous Inference for Massive Data: Distributed Bootstrap

Figure 3 for Simultaneous Inference for Massive Data: Distributed Bootstrap

Figure 4 for Simultaneous Inference for Massive Data: Distributed Bootstrap

Abstract:In this paper, we propose a bootstrap method applied to massive data processed distributedly in a large number of machines. This new method is computationally efficient in that we bootstrap on the master machine without over-resampling, typically required by existing methods \cite{kleiner2014scalable,sengupta2016subsampled}, while provably achieving optimal statistical efficiency with minimal communication. Our method does not require repeatedly re-fitting the model but only applies multiplier bootstrap in the master machine on the gradients received from the worker machines. Simulations validate our theory.

Via

Access Paper or Ask Questions

Residual Bootstrap Exploration for Bandit Algorithms

Feb 19, 2020

Chi-Hua Wang, Yang Yu, Botao Hao, Guang Cheng

Figure 1 for Residual Bootstrap Exploration for Bandit Algorithms

Figure 2 for Residual Bootstrap Exploration for Bandit Algorithms

Figure 3 for Residual Bootstrap Exploration for Bandit Algorithms

Figure 4 for Residual Bootstrap Exploration for Bandit Algorithms

Abstract:In this paper, we propose a novel perturbation-based exploration method in bandit algorithms with bounded or unbounded rewards, called residual bootstrap exploration (\texttt{ReBoot}). The \texttt{ReBoot} enforces exploration by injecting data-driven randomness through a residual-based perturbation mechanism. This novel mechanism captures the underlying distributional properties of fitting errors, and more importantly boosts exploration to escape from suboptimal solutions (for small sample sizes) by inflating variance level in an \textit{unconventional} way. In theory, with appropriate variance inflation level, \texttt{ReBoot} provably secures instance-dependent logarithmic regret in Gaussian multi-armed bandits. We evaluate the \texttt{ReBoot} in different synthetic multi-armed bandits problems and observe that the \texttt{ReBoot} performs better for unbounded rewards and more robustly than \texttt{Giro} \cite{kveton2018garbage} and \texttt{PHE} \cite{kveton2019perturbed}, with comparable computational efficiency to the Thompson sampling method.

* The first two authors contributed equally

Via

Access Paper or Ask Questions

Predictive Power of Nearest Neighbors Algorithm under Random Perturbation

Feb 13, 2020

Yue Xing, Qifan Song, Guang Cheng

Figure 1 for Predictive Power of Nearest Neighbors Algorithm under Random Perturbation

Figure 2 for Predictive Power of Nearest Neighbors Algorithm under Random Perturbation

Figure 3 for Predictive Power of Nearest Neighbors Algorithm under Random Perturbation

Figure 4 for Predictive Power of Nearest Neighbors Algorithm under Random Perturbation

Abstract:We consider a data corruption scenario in the classical $k$ Nearest Neighbors ($k$-NN) algorithm, that is, the testing data are randomly perturbed. Under such a scenario, the impact of corruption level on the asymptotic regret is carefully characterized. In particular, our theoretical analysis reveals a phase transition phenomenon that, when the corruption level $\omega$ is below a critical order (i.e., small-$\omega$ regime), the asymptotic regret remains the same; when it is beyond that order (i.e., large-$\omega$ regime), the asymptotic regret deteriorates polynomially. Surprisingly, we obtain a negative result that the classical noise-injection approach will not help improve the testing performance in the beginning stage of the large-$\omega$ regime, even in the level of the multiplicative constant of asymptotic regret. As a technical by-product, we prove that under different model assumptions, the pre-processed 1-NN proposed in \cite{xue2017achieving} will at most achieve a sub-optimal rate when the data dimension $d>4$ even if $k$ is chosen optimally in the pre-processing step.

Via

Access Paper or Ask Questions