Abstract:Existing analyses of reverse diffusion often propagate sampling error in the Euclidean geometry underlying \(\Wtwo\) along the entire reverse trajectory. Under weak log-concavity, however, Gaussian smoothing can create contraction first at large separations while short separations remain non-dissipative. The first usable contraction is therefore radial rather than Euclidean, creating a metric mismatch between the geometry that contracts early and the geometry in which the terminal error is measured. We formalize this mismatch through an explicit radial lower profile for the learned reverse drift. Its far-field limit gives a contraction reserve, its near-field limit gives the Euclidean load governing direct \(\Wtwo\) propagation, and admissible switch times are characterized by positivity of the reserve on the remaining smoothing window. We exploit this structure with a one-switch routing argument. Before the switch, reflection coupling yields contraction in a concave transport metric adapted to the radial profile. At the switch, we convert once from this metric back to \(\Wtwo\) under a \(p\)-moment budget, and then propagate the converted discrepancy over the remaining short window in Euclidean geometry. For discretizations of the learned reverse SDE under \(L^2\) score-error control, a one-sided Lipschitz condition of score error, and standard well-posedness and coupling hypotheses, we obtain explicit non-asymptotic end-to-end \(\Wtwo\) guarantees, a scalar switch-selection objective, and a sharp structural limit on the conversion exponent within the affine-tail concave class.
Abstract:We study stochastic multi-armed bandits under simultaneous constraints on space and adaptivity: the learner interacts with the environment in $B$ batches and has only $W$ bits of persistent memory. Prior work shows that each constraint alone is surprisingly mild: near-minimax regret $\widetilde{O}(\sqrt{KT})$ is achievable with $O(\log T)$ bits of memory under fully adaptive interaction, and with a $K$-independent $O(\log\log T)$-type number of batches when memory is unrestricted. We show that this picture breaks down in the simultaneously constrained regime. We prove that any algorithm with a $W$-bit memory constraint must use at least $Ω(K/W)$ batches to achieve near-minimax regret $\widetilde{O}(\sqrt{KT})$ , even under adaptive grids. In particular, logarithmic memory rules out $K$-independent batch complexity. Our proof is based on an information bottleneck. We show that near-minimax regret forces the learner to acquire $Ω(K)$ bits of information about the hidden set of good arms under a suitable hard prior, whereas an algorithm with $B$ batches and $W$ bits of memory allows only $O(BW)$ bits of information. A key ingredient is a localized change-of-measure lemma that yields probability-level arm exploration guarantees, which is of independent interest. We also give an algorithm using $O(\log T)$ bits of memory and $\widetilde{O}(K)$ batches that achieves regret $\widetilde{O}(\sqrt{KT})$, which nearly matches our lower bound.