Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Thomas Kesselheim

Contextual Learning for Stochastic Optimization

May 22, 2025

Anna Heuser, Thomas Kesselheim

Abstract:Motivated by stochastic optimization, we introduce the problem of learning from samples of contextual value distributions. A contextual value distribution can be understood as a family of real-valued distributions, where each sample consists of a context $x$ and a random variable drawn from the corresponding real-valued distribution $D_x$. By minimizing a convex surrogate loss, we learn an empirical distribution $D'_x$ for each context, ensuring a small L\'evy distance to $D_x$. We apply this result to obtain the sample complexity bounds for the learning of an $\epsilon$-optimal policy for stochastic optimization problems defined on an unknown contextual value distribution. The sample complexity is shown to be polynomial for the general case of strongly monotone and stable optimization problems, including Single-item Revenue Maximization, Pandora's Box and Optimal Stopping.

* Full version of EC'25 paper

Via

Access Paper or Ask Questions

Online Combinatorial Allocations and Auctions with Few Samples

Sep 17, 2024

Paul Dütting, Thomas Kesselheim, Brendan Lucier, Rebecca Reiffenhäuser, Sahil Singla

Abstract:In online combinatorial allocations/auctions, n bidders sequentially arrive, each with a combinatorial valuation (such as submodular/XOS) over subsets of m indivisible items. The aim is to immediately allocate a subset of the remaining items to maximize the total welfare, defined as the sum of bidder valuations. A long line of work has studied this problem when the bidder valuations come from known independent distributions. In particular, for submodular/XOS valuations, we know 2-competitive algorithms/mechanisms that set a fixed price for each item and the arriving bidders take their favorite subset of the remaining items given these prices. However, these algorithms traditionally presume the availability of the underlying distributions as part of the input to the algorithm. Contrary to this assumption, practical scenarios often require the learning of distributions, a task complicated by limited sample availability. This paper investigates the feasibility of achieving O(1)-competitive algorithms under the realistic constraint of having access to only a limited number of samples from the underlying bidder distributions. Our first main contribution shows that a mere single sample from each bidder distribution is sufficient to yield an O(1)-competitive algorithm for submodular/XOS valuations. This result leverages a novel extension of the secretary-style analysis, employing the sample to have the algorithm compete against itself. Although online, this first approach does not provide an online truthful mechanism. Our second main contribution shows that a polynomial number of samples suffices to yield a $(2+\epsilon)$-competitive online truthful mechanism for submodular/XOS valuations and any constant $\epsilon>0$. This result is based on a generalization of the median-based algorithm for the single-item prophet inequality problem to combinatorial settings with multiple items.

* Preliminary version in FOCS 2024

Via

Access Paper or Ask Questions

Bandit Algorithms for Prophet Inequality and Pandora's Box

Nov 16, 2022

Khashayar Gatmiry, Thomas Kesselheim, Sahil Singla, Yifan Wang

Abstract:The Prophet Inequality and Pandora's Box problems are fundamental stochastic problem with applications in Mechanism Design, Online Algorithms, Stochastic Optimization, Optimal Stopping, and Operations Research. A usual assumption in these works is that the probability distributions of the $n$ underlying random variables are given as input to the algorithm. Since in practice these distributions need to be learned, we initiate the study of such stochastic problems in the Multi-Armed Bandits model. In the Multi-Armed Bandits model we interact with $n$ unknown distributions over $T$ rounds: in round $t$ we play a policy $x^{(t)}$ and receive a partial (bandit) feedback on the performance of $x^{(t)}$. The goal is to minimize the regret, which is the difference over $T$ rounds in the total value of the optimal algorithm that knows the distributions vs. the total value of our algorithm that learns the distributions from the partial feedback. Our main results give near-optimal $\tilde{O}(\mathsf{poly}(n)\sqrt{T})$ total regret algorithms for both Prophet Inequality and Pandora's Box. Our proofs proceed by maintaining confidence intervals on the unknown indices of the optimal policy. The exploration-exploitation tradeoff prevents us from directly refining these confidence intervals, so the main technique is to design a regret upper bound that is learnable while playing low-regret Bandit policies.

Via

Access Paper or Ask Questions

Online Learning with Vector Costs and Bandits with Knapsacks

Oct 14, 2020

Thomas Kesselheim, Sahil Singla

Abstract:We introduce online learning with vector costs (\OLVCp) where in each time step $t \in \{1,\ldots, T\}$, we need to play an action $i \in \{1,\ldots,n\}$ that incurs an unknown vector cost in $[0,1]^{d}$. The goal of the online algorithm is to minimize the $\ell_p$ norm of the sum of its cost vectors. This captures the classical online learning setting for $d=1$, and is interesting for general $d$ because of applications like online scheduling where we want to balance the load between different machines (dimensions). We study \OLVCp in both stochastic and adversarial arrival settings, and give a general procedure to reduce the problem from $d$ dimensions to a single dimension. This allows us to use classical online learning algorithms in both full and bandit feedback models to obtain (near) optimal results. In particular, we obtain a single algorithm (up to the choice of learning rate) that gives sublinear regret for stochastic arrivals and a tight $O(\min\{p, \log d\})$ competitive ratio for adversarial arrivals. The \OLVCp problem also occurs as a natural subproblem when trying to solve the popular Bandits with Knapsacks (\BwK) problem. This connection allows us to use our \OLVCp techniques to obtain (near) optimal results for \BwK in both stochastic and adversarial settings. In particular, we obtain a tight $O(\log d \cdot \log T)$ competitive ratio algorithm for adversarial \BwK, which improves over the $O(d \cdot \log T)$ competitive ratio algorithm of Immorlica et al. [FOCS'19].

* Preliminary version appeared in COLT 2020

Via

Access Paper or Ask Questions