Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Quanquan Gu

Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium

Aug 10, 2022

Chris Junchi Li, Dongruo Zhou, Quanquan Gu, Michael I. Jordan

Abstract:We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation, where the action-value function is approximated by a function in a Reproducing Kernel Hilbert Space (RKHS). The key challenge is how to do exploration in the high-dimensional function space. We propose a novel online learning algorithm to find a Nash equilibrium by minimizing the duality gap. At the core of our algorithms are upper and lower confidence bounds that are derived based on the principle of optimism in the face of uncertainty. We prove that our algorithm is able to attain an $O(\sqrt{T})$ regret with polynomial computational complexity, under very mild assumptions on the reward function and the underlying dynamic of the Markov Games. We also propose several extensions of our algorithm, including an algorithm with Bernstein-type bonus that can achieve a tighter regret bound, and another algorithm for model misspecification that can be applied to neural function approximation.

* 42 pages

Via

Access Paper or Ask Questions

Towards Understanding Mixture of Experts in Deep Learning

Aug 04, 2022

Zixiang Chen, Yihe Deng, Yue Wu, Quanquan Gu, Yuanzhi Li

Figure 1 for Towards Understanding Mixture of Experts in Deep Learning

Figure 2 for Towards Understanding Mixture of Experts in Deep Learning

Figure 3 for Towards Understanding Mixture of Experts in Deep Learning

Figure 4 for Towards Understanding Mixture of Experts in Deep Learning

Abstract:The Mixture-of-Experts (MoE) layer, a sparsely-activated model controlled by a router, has achieved great success in deep learning. However, the understanding of such architecture remains elusive. In this paper, we formally study how the MoE layer improves the performance of neural network learning and why the mixture model will not collapse into a single model. Our empirical results suggest that the cluster structure of the underlying problem and the non-linearity of the expert are pivotal to the success of MoE. To further understand this, we consider a challenging classification problem with intrinsic cluster structures, which is hard to learn using a single expert. Yet with the MoE layer, by choosing the experts as two-layer nonlinear convolutional neural networks (CNNs), we show that the problem can be learned successfully. Furthermore, our theory shows that the router can learn the cluster-center features, which helps divide the input complex problem into simpler linear classification sub-problems that individual experts can conquer. To our knowledge, this is the first result towards formally understanding the mechanism of the MoE layer for deep learning.

* 53 pages, 8 figures, 11 tables

Via

Access Paper or Ask Questions

The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift

Aug 03, 2022

Jingfeng Wu, Difan Zou, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

Figure 1 for The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift

Figure 2 for The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift

Abstract:We study linear regression under covariate shift, where the marginal distribution over the input covariates differs in the source and the target domains, while the conditional distribution of the output given the input covariates is similar across the two domains. We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data (both conducted by online SGD) for this problem. We establish sharp instance-dependent excess risk upper and lower bounds for this approach. Our bounds suggest that for a large class of linear regression instances, transfer learning with $O(N^2)$ source data (and scarce or no target data) is as effective as supervised learning with $N$ target data. In addition, we show that finetuning, even with only a small amount of target data, could drastically reduce the amount of source data required by pretraining. Our theory sheds light on the effectiveness and limitation of pretraining as well as the benefits of finetuning for tackling covariate shift problems.

* 32 pages, 1 figure, 1 table

Via

Access Paper or Ask Questions

A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits

Jul 07, 2022

Jiafan He, Tianhao Wang, Yifei Min, Quanquan Gu

Figure 1 for A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits

Figure 2 for A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits

Figure 3 for A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits

Abstract:We study federated contextual linear bandits, where $M$ agents cooperate with each other to solve a global contextual linear bandit problem with the help of a central server. We consider the asynchronous setting, where all agents work independently and the communication between one agent and the server will not trigger other agents' communication. We propose a simple algorithm named \texttt{FedLinUCB} based on the principle of optimism. We prove that the regret of \texttt{FedLinUCB} is bounded by $\tilde{O}(d\sqrt{\sum_{m=1}^M T_m})$ and the communication complexity is $\tilde{O}(dM^2)$, where $d$ is the dimension of the contextual vector and $T_m$ is the total number of interactions with the environment by $m$-th agent. To the best of our knowledge, this is the first provably efficient algorithm that allows fully asynchronous communication for federated contextual linear bandits, while achieving the same regret guarantee as in the single-agent setting.

* 25 pages, 1 figure, 2 tables

Via

Access Paper or Ask Questions

Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs

May 23, 2022

Dongruo Zhou, Quanquan Gu

Figure 1 for Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs

Abstract:Recent studies have shown that episodic reinforcement learning (RL) is not more difficult than contextual bandits, even with a long planning horizon and unknown state transitions. However, these results are limited to either tabular Markov decision processes (MDPs) or computationally inefficient algorithms for linear mixture MDPs. In this paper, we propose the first computationally efficient horizon-free algorithm for linear mixture MDPs, which achieves the optimal $\tilde O(d\sqrt{K} +d^2)$ regret up to logarithmic factors. Our algorithm adapts a weighted least square estimator for the unknown transitional dynamic, where the weight is both \emph{variance-aware} and \emph{uncertainty-aware}. When applying our weighted least square estimator to heterogeneous linear bandits, we can obtain an $\tilde O(d\sqrt{\sum_{k=1}^K \sigma_k^2} +d)$ regret in the first $K$ rounds, where $d$ is the dimension of the context and $\sigma_k^2$ is the variance of the reward in the $k$-th round. This also improves upon the best-known algorithms in this setting when $\sigma_k^2$'s are known.

* 33 pages, 1 table

Via

Access Paper or Ask Questions

Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions

May 13, 2022

Jiafan He, Dongruo Zhou, Tong Zhang, Quanquan Gu

Figure 1 for Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions

Abstract:We study the linear contextual bandit problem in the presence of adversarial corruption, where the reward at each round is corrupted by an adversary, and the corruption level (i.e., the sum of corruption magnitudes over the horizon) is $C\geq 0$. The best-known algorithms in this setting are limited in that they either are computationally inefficient or require a strong assumption on the corruption, or their regret is at least $C$ times worse than the regret without corruption. In this paper, to overcome these limitations, we propose a new algorithm based on the principle of optimism in the face of uncertainty. At the core of our algorithm is a weighted ridge regression where the weight of each chosen action depends on its confidence up to some threshold. We show that for both known $C$ and unknown $C$ cases, our algorithm with proper choice of hyperparameter achieves a regret that nearly matches the lower bounds. Thus, our algorithm is nearly optimal up to logarithmic factors for both cases. Notably, our algorithm achieves the near-optimal regret for both corrupted and uncorrupted cases ($C=0$) simultaneously.

* 29 pages, 1 table

Via

Access Paper or Ask Questions

On the Convergence of Certified Robust Training with Interval Bound Propagation

Mar 16, 2022

Yihan Wang, Zhouxing Shi, Quanquan Gu, Cho-Jui Hsieh

Figure 1 for On the Convergence of Certified Robust Training with Interval Bound Propagation

Abstract:Interval Bound Propagation (IBP) is so far the base of state-of-the-art methods for training neural networks with certifiable robustness guarantees when potential adversarial perturbations present, while the convergence of IBP training remains unknown in existing literature. In this paper, we present a theoretical analysis on the convergence of IBP training. With an overparameterized assumption, we analyze the convergence of IBP robust training. We show that when using IBP training to train a randomly initialized two-layer ReLU neural network with logistic loss, gradient descent can linearly converge to zero robust training error with a high probability if we have sufficiently small perturbation radius and large network width.

* ICLR 2022

Via

Access Paper or Ask Questions

Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime

Mar 07, 2022

Difan Zou, Jingfeng Wu, Vladimir Braverman, Quanquan Gu, Sham M. Kakade

Figure 1 for Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime

Figure 2 for Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime

Abstract:Stochastic gradient descent (SGD) has achieved great success due to its superior performance in both optimization and generalization. Most of existing generalization analyses are made for single-pass SGD, which is a less practical variant compared to the commonly-used multi-pass SGD. Besides, theoretical analyses for multi-pass SGD often concern a worst-case instance in a class of problems, which may be pessimistic to explain the superior generalization ability for some particular problem instance. The goal of this paper is to sharply characterize the generalization of multi-pass SGD, by developing an instance-dependent excess risk bound for least squares in the interpolation regime, which is expressed as a function of the iteration number, stepsize, and data covariance. We show that the excess risk of SGD can be exactly decomposed into the excess risk of GD and a positive fluctuation error, suggesting that SGD always performs worse, instance-wisely, than GD, in generalization. On the other hand, we show that although SGD needs more iterations than GD to achieve the same level of excess risk, it saves the number of stochastic gradient evaluations, and therefore is preferable in terms of computational time.

* 28 pages, 2 figures

Via

Access Paper or Ask Questions

Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds

Feb 28, 2022

Heyang Zhao, Dongruo Zhou, Jiafan He, Quanquan Gu

Figure 1 for Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds

Abstract:We consider learning a stochastic bandit model, where the reward function belongs to a general class of uniformly bounded functions, and the additive noise can be heteroscedastic. Our model captures contextual linear bandits and generalized linear bandits as special cases. While previous works (Kirschner and Krause, 2018; Zhou et al., 2021) based on weighted ridge regression can deal with linear bandits with heteroscedastic noise, they are not directly applicable to our general model due to the curse of nonlinearity. In order to tackle this problem, we propose a multi-level learning framework for the general bandit model. The core idea of our framework is to partition the observed data into different levels according to the variance of their respective reward and perform online learning at each level collaboratively. Under our framework, we first design an algorithm that constructs the variance-aware confidence set based on empirical risk minimization and prove a variance-dependent regret bound. For generalized linear bandits, we further propose an algorithm based on follow-the-regularized-leader (FTRL) subroutine and online-to-confidence-set conversion, which can achieve a tighter variance-dependent regret under certain conditions.

* 33 pages, 1 table

Via

Access Paper or Ask Questions

Benign Overfitting in Two-layer Convolutional Neural Networks

Feb 14, 2022

Yuan Cao, Zixiang Chen, Mikhail Belkin, Quanquan Gu

Figure 1 for Benign Overfitting in Two-layer Convolutional Neural Networks

Abstract:Modern neural networks often have great expressive power and can be trained to overfit the training data, while still achieving a good test performance. This phenomenon is referred to as "benign overfitting". Recently, there emerges a line of works studying "benign overfitting" from the theoretical perspective. However, they are limited to linear models or kernel/random feature models, and there is still a lack of theoretical understanding about when and how benign overfitting occurs in neural networks. In this paper, we study the benign overfitting phenomenon in training a two-layer convolutional neural network (CNN). We show that when the signal-to-noise ratio satisfies a certain condition, a two-layer CNN trained by gradient descent can achieve arbitrarily small training and test loss. On the other hand, when this condition does not hold, overfitting becomes harmful and the obtained CNN can only achieve constant level test loss. These together demonstrate a sharp phase transition between benign overfitting and harmful overfitting, driven by the signal-to-noise ratio. To the best of our knowledge, this is the first work that precisely characterizes the conditions under which benign overfitting can occur in training convolutional neural networks.

* 35 pages, 1 figure

Via

Access Paper or Ask Questions