Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Seok-Jin Kim

Experimental Design for Semiparametric Bandits

Jun 16, 2025

Seok-Jin Kim, Gi-Soo Kim, Min-hwan Oh

Abstract:We study finite-armed semiparametric bandits, where each arm's reward combines a linear component with an unknown, potentially adversarial shift. This model strictly generalizes classical linear bandits and reflects complexities common in practice. We propose the first experimental-design approach that simultaneously offers a sharp regret bound, a PAC bound, and a best-arm identification guarantee. Our method attains the minimax regret $\tilde{O}(\sqrt{dT})$, matching the known lower bound for finite-armed linear bandits, and further achieves logarithmic regret under a positive suboptimality gap condition. These guarantees follow from our refined non-asymptotic analysis of orthogonalized regression that attains the optimal $\sqrt{d}$ rate, paving the way for robust and efficient learning across a broad class of semiparametric bandit problems.

* Accepted at COLT 2025

Via

Access Paper or Ask Questions

Local Anti-Concentration Class: Logarithmic Regret for Greedy Linear Contextual Bandit

Nov 19, 2024

Seok-Jin Kim, Min-hwan Oh

Figure 1 for Local Anti-Concentration Class: Logarithmic Regret for Greedy Linear Contextual Bandit

Figure 2 for Local Anti-Concentration Class: Logarithmic Regret for Greedy Linear Contextual Bandit

Figure 3 for Local Anti-Concentration Class: Logarithmic Regret for Greedy Linear Contextual Bandit

Figure 4 for Local Anti-Concentration Class: Logarithmic Regret for Greedy Linear Contextual Bandit

Abstract:We study the performance guarantees of exploration-free greedy algorithms for the linear contextual bandit problem. We introduce a novel condition, named the \textit{Local Anti-Concentration} (LAC) condition, which enables a greedy bandit algorithm to achieve provable efficiency. We show that the LAC condition is satisfied by a broad class of distributions, including Gaussian, exponential, uniform, Cauchy, and Student's~$t$ distributions, along with other exponential family distributions and their truncated variants. This significantly expands the class of distributions under which greedy algorithms can perform efficiently. Under our proposed LAC condition, we prove that the cumulative expected regret of the greedy algorithm for the linear contextual bandit is bounded by $O(\operatorname{poly} \log T)$. Our results establish the widest range of distributions known to date that allow a sublinear regret bound for greedy algorithms, further achieving a sharp poly-logarithmic regret.

* NeurIPS2024

Via

Access Paper or Ask Questions