Alert button
Picture for Kwang-Sung Jun

Kwang-Sung Jun

Alert button

Graph Sparsifications using Neural Network Assisted Monte Carlo Tree Search

Nov 17, 2023
Alvin Chiu, Mithun Ghosh, Reyan Ahmed, Kwang-Sung Jun, Stephen Kobourov, Michael T. Goodrich

Graph neural networks have been successful for machine learning, as well as for combinatorial and graph problems such as the Subgraph Isomorphism Problem and the Traveling Salesman Problem. We describe an approach for computing graph sparsifiers by combining a graph neural network and Monte Carlo Tree Search. We first train a graph neural network that takes as input a partial solution and proposes a new node to be added as output. This neural network is then used in a Monte Carlo search to compute a sparsifier. The proposed method consistently outperforms several standard approximation algorithms on different types of graphs and often finds the optimal solution.

* arXiv admin note: substantial text overlap with arXiv:2305.00535 
Viaarxiv icon

Improved Regret Bounds of (Multinomial) Logistic Bandits via Regret-to-Confidence-Set Conversion

Oct 28, 2023
Junghyun Lee, Se-Young Yun, Kwang-Sung Jun

Logistic bandit is a ubiquitous framework of modeling users' choices, e.g., click vs. no click for advertisement recommender system. We observe that the prior works overlook or neglect dependencies in $S \geq \lVert \theta_\star \rVert_2$, where $\theta_\star \in \mathbb{R}^d$ is the unknown parameter vector, which is particularly problematic when $S$ is large, e.g., $S \geq d$. In this work, we improve the dependency on $S$ via a novel approach called {\it regret-to-confidence set conversion (R2CS)}, which allows us to construct a convex confidence set based on only the \textit{existence} of an online learning algorithm with a regret guarantee. Using R2CS, we obtain a strict improvement in the regret bound w.r.t. $S$ in logistic bandits while retaining computational feasibility and the dependence on other factors such as $d$ and $T$. We apply our new confidence set to the regret analyses of logistic bandits with a new martingale concentration step that circumvents an additional factor of $S$. We then extend this analysis to multinomial logistic bandits and obtain similar improvements in the regret, showing the efficacy of R2CS. While we applied R2CS to the (multinomial) logistic model, R2CS is a generic approach for developing confidence sets that can be used for various models, which can be of independent interest.

* 32 pages, 2 figures, 1 table 
Viaarxiv icon

Nearly Optimal Steiner Trees using Graph Neural Network Assisted Monte Carlo Tree Search

Apr 30, 2023
Reyan Ahmed, Mithun Ghosh, Kwang-Sung Jun, Stephen Kobourov

Figure 1 for Nearly Optimal Steiner Trees using Graph Neural Network Assisted Monte Carlo Tree Search
Figure 2 for Nearly Optimal Steiner Trees using Graph Neural Network Assisted Monte Carlo Tree Search
Figure 3 for Nearly Optimal Steiner Trees using Graph Neural Network Assisted Monte Carlo Tree Search
Figure 4 for Nearly Optimal Steiner Trees using Graph Neural Network Assisted Monte Carlo Tree Search

Graph neural networks are useful for learning problems, as well as for combinatorial and graph problems such as the Subgraph Isomorphism Problem and the Traveling Salesman Problem. We describe an approach for computing Steiner Trees by combining a graph neural network and Monte Carlo Tree Search. We first train a graph neural network that takes as input a partial solution and proposes a new node to be added as output. This neural network is then used in a Monte Carlo search to compute a Steiner tree. The proposed method consistently outperforms the standard 2-approximation algorithm on many different types of graphs and often finds the optimal solution.

Viaarxiv icon

Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded Rewards

Apr 28, 2023
Hao Qin, Kwang-Sung Jun, Chicheng Zhang

Figure 1 for Kullback-Leibler Maillard Sampling for Multi-armed Bandits with Bounded Rewards

We study $K$-armed bandit problems where the reward distributions of the arms are all supported on the $[0,1]$ interval. It has been a challenge to design regret-efficient randomized exploration algorithms in this setting. Maillard sampling~\cite{maillard13apprentissage}, an attractive alternative to Thompson sampling, has recently been shown to achieve competitive regret guarantees in the sub-Gaussian reward setting~\cite{bian2022maillard} while maintaining closed-form action probabilities, which is useful for offline policy evaluation. In this work, we propose the Kullback-Leibler Maillard Sampling (KL-MS) algorithm, a natural extension of Maillard sampling for achieving KL-style gap-dependent regret bound. We show that KL-MS enjoys the asymptotic optimality when the rewards are Bernoulli and has a worst-case regret bound of the form $O(\sqrt{\mu^*(1-\mu^*) K T \ln K} + K \ln T)$, where $\mu^*$ is the expected reward of the optimal arm, and $T$ is the time horizon length.

Viaarxiv icon

Tighter PAC-Bayes Bounds Through Coin-Betting

Feb 12, 2023
Kyoungseok Jang, Kwang-Sung Jun, Ilja Kuzborskij, Francesco Orabona

Figure 1 for Tighter PAC-Bayes Bounds Through Coin-Betting
Figure 2 for Tighter PAC-Bayes Bounds Through Coin-Betting

We consider the problem of estimating the mean of a sequence of random elements $f(X_1, \theta)$ $, \ldots, $ $f(X_n, \theta)$ where $f$ is a fixed scalar function, $S=(X_1, \ldots, X_n)$ are independent random variables, and $\theta$ is a possibly $S$-dependent parameter. An example of such a problem would be to estimate the generalization error of a neural network trained on $n$ examples where $f$ is a loss function. Classically, this problem is approached through concentration inequalities holding uniformly over compact parameter sets of functions $f$, for example as in Rademacher or VC type analysis. However, in many problems, such inequalities often yield numerically vacuous estimates. Recently, the \emph{PAC-Bayes} framework has been proposed as a better alternative for this class of problems for its ability to often give numerically non-vacuous bounds. In this paper, we show that we can do even better: we show how to refine the proof strategy of the PAC-Bayes bounds and achieve \emph{even tighter} guarantees. Our approach is based on the \emph{coin-betting} framework that derives the numerically tightest known time-uniform concentration inequalities from the regret guarantees of online gambling algorithms. In particular, we derive the first PAC-Bayes concentration inequality based on the coin-betting approach that holds simultaneously for all sample sizes. We demonstrate its tightness showing that by \emph{relaxing} it we obtain a number of previous results in a closed form including Bernoulli-KL and empirical Bernstein inequalities. Finally, we propose an efficient algorithm to numerically calculate confidence sequences from our bound, which often generates nonvacuous confidence bounds even with one sample, unlike the state-of-the-art PAC-Bayes bounds.

Viaarxiv icon

Revisiting Simple Regret Minimization in Multi-Armed Bandits

Oct 30, 2022
Yao Zhao, Connor Stephens, Csaba Szepesvári, Kwang-Sung Jun

Figure 1 for Revisiting Simple Regret Minimization in Multi-Armed Bandits
Figure 2 for Revisiting Simple Regret Minimization in Multi-Armed Bandits
Figure 3 for Revisiting Simple Regret Minimization in Multi-Armed Bandits

Simple regret is a natural and parameter-free performance criterion for identifying a good arm in multi-armed bandits yet is less popular than the probability of missing the best arm or an $\epsilon$-good arm, perhaps due to lack of easy ways to characterize it. In this paper, we achieve improved simple regret upper bounds for both data-rich ($T\ge n$) and data-poor regime ($T \le n$) where $n$ is the number of arms and $T$ is the number of samples. At its heart is an improved analysis of the well-known Sequential Halving (SH) algorithm that bounds the probability of returning an arm whose mean reward is not within $\epsilon$ from the best (i.e., not $\epsilon$-good) for any choice of $\epsilon>0$, although $\epsilon$ is not an input to SH. We show that this directly implies an optimal simple regret bound of $\mathcal{O}(\sqrt{n/T})$. Furthermore, our upper bound gets smaller as a function of the number of $\epsilon$-good arms. This results in an accelerated rate for the $(\epsilon,\delta)$-PAC criterion, which closes the gap between the upper and lower bounds in prior art. For the more challenging data-poor regime, we propose Bracketing SH (BSH) that enjoys the same improvement even without sampling each arm at least once. Our empirical study shows that BSH outperforms existing methods on real-world tasks.

Viaarxiv icon

PopArt: Efficient Sparse Regression and Experimental Design for Optimal Sparse Linear Bandits

Oct 25, 2022
Kyoungseok Jang, Chicheng Zhang, Kwang-Sung Jun

Figure 1 for PopArt: Efficient Sparse Regression and Experimental Design for Optimal Sparse Linear Bandits
Figure 2 for PopArt: Efficient Sparse Regression and Experimental Design for Optimal Sparse Linear Bandits

In sparse linear bandits, a learning agent sequentially selects an action and receive reward feedback, and the reward function depends linearly on a few coordinates of the covariates of the actions. This has applications in many real-world sequential decision making problems. In this paper, we propose a simple and computationally efficient sparse linear estimation method called PopArt that enjoys a tighter $\ell_1$ recovery guarantee compared to Lasso (Tibshirani, 1996) in many problems. Our bound naturally motivates an experimental design criterion that is convex and thus computationally efficient to solve. Based on our novel estimator and design criterion, we derive sparse linear bandit algorithms that enjoy improved regret upper bounds upon the state of the art (Hao et al., 2020), especially w.r.t. the geometry of the given action set. Finally, we prove a matching lower bound for sparse linear bandits in the data-poor regime, which closes the gap between upper and lower bounds in prior work.

* 10 pages, 1 figures, to be published in 2022 Conference on Neural Information Processing Systems 
Viaarxiv icon

Norm-Agnostic Linear Bandits

May 03, 2022
Spencer, Gales, Sunder Sethuraman, Kwang-Sung Jun

Figure 1 for Norm-Agnostic Linear Bandits
Figure 2 for Norm-Agnostic Linear Bandits
Figure 3 for Norm-Agnostic Linear Bandits

Linear bandits have a wide variety of applications including recommendation systems yet they make one strong assumption: the algorithms must know an upper bound $S$ on the norm of the unknown parameter $\theta^*$ that governs the reward generation. Such an assumption forces the practitioner to guess $S$ involved in the confidence bound, leaving no choice but to wish that $\|\theta^*\|\le S$ is true to guarantee that the regret will be low. In this paper, we propose novel algorithms that do not require such knowledge for the first time. Specifically, we propose two algorithms and analyze their regret bounds: one for the changing arm set setting and the other for the fixed arm set setting. Our regret bound for the former shows that the price of not knowing $S$ does not affect the leading term in the regret bound and inflates only the lower order term. For the latter, we do not pay any price in the regret for now knowing $S$. Our numerical experiments show standard algorithms assuming knowledge of $S$ can fail catastrophically when $\|\theta^*\|\le S$ is not true whereas our algorithms enjoy low regret.

* AISTATS'22; added acknowledgements 
Viaarxiv icon

An Experimental Design Approach for Regret Minimization in Logistic Bandits

Feb 04, 2022
Blake Mason, Kwang-Sung Jun, Lalit Jain

Figure 1 for An Experimental Design Approach for Regret Minimization in Logistic Bandits
Figure 2 for An Experimental Design Approach for Regret Minimization in Logistic Bandits
Figure 3 for An Experimental Design Approach for Regret Minimization in Logistic Bandits
Figure 4 for An Experimental Design Approach for Regret Minimization in Logistic Bandits

In this work we consider the problem of regret minimization for logistic bandits. The main challenge of logistic bandits is reducing the dependence on a potentially large problem dependent constant $\kappa$ that can at worst scale exponentially with the norm of the unknown parameter $\theta_{\ast}$. Abeille et al. (2021) have applied self-concordance of the logistic function to remove this worst-case dependence providing regret guarantees like $O(d\log^2(\kappa)\sqrt{\dot\mu T}\log(|\mathcal{X}|))$ where $d$ is the dimensionality, $T$ is the time horizon, and $\dot\mu$ is the variance of the best-arm. This work improves upon this bound in the fixed arm setting by employing an experimental design procedure that achieves a minimax regret of $O(\sqrt{d \dot\mu T\log(|\mathcal{X}|)})$. Our regret bound in fact takes a tighter instance (i.e., gap) dependent regret bound for the first time in logistic bandits. We also propose a new warmup sampling algorithm that can dramatically reduce the lower order term in the regret in general and prove that it can replace the lower order term dependency on $\kappa$ to $\log^2(\kappa)$ for some instances. Finally, we discuss the impact of the bias of the MLE on the logistic bandit problem, providing an example where $d^2$ lower order regret (cf., it is $d$ for linear bandits) may not be improved as long as the MLE is used and how bias-corrected estimators may be used to make it closer to $d$.

Viaarxiv icon

Jointly Efficient and Optimal Algorithms for Logistic Bandits

Jan 19, 2022
Louis Faury, Marc Abeille, Kwang-Sung Jun, Clément Calauzènes

Figure 1 for Jointly Efficient and Optimal Algorithms for Logistic Bandits
Figure 2 for Jointly Efficient and Optimal Algorithms for Logistic Bandits
Figure 3 for Jointly Efficient and Optimal Algorithms for Logistic Bandits

Logistic Bandits have recently undergone careful scrutiny by virtue of their combined theoretical and practical relevance. This research effort delivered statistically efficient algorithms, improving the regret of previous strategies by exponentially large factors. Such algorithms are however strikingly costly as they require $\Omega(t)$ operations at each round. On the other hand, a different line of research focused on computational efficiency ($\mathcal{O}(1)$ per-round cost), but at the cost of letting go of the aforementioned exponential improvements. Obtaining the best of both world is unfortunately not a matter of marrying both approaches. Instead we introduce a new learning procedure for Logistic Bandits. It yields confidence sets which sufficient statistics can be easily maintained online without sacrificing statistical tightness. Combined with efficient planning mechanisms we design fast algorithms which regret performance still match the problem-dependent lower-bound of Abeille et al. (2021). To the best of our knowledge, those are the first Logistic Bandit algorithms that simultaneously enjoy statistical and computational efficiency.

Viaarxiv icon