Abstract:Classifier-guided diffusion models generate conditional samples by augmenting the reverse-time score with the gradient of the log-probability predicted by a probabilistic classifier. In practice, this classifier is usually obtained by minimizing an empirical loss function. While existing statistical theory guarantees good generalization performance when the sample size is sufficiently large, it remains unclear whether such training yields an effective guidance mechanism. We study this question in the context of cross-entropy loss, which is widely used for classifier training. Under mild smoothness assumptions on the classifier, we show that controlling the cross-entropy at each diffusion model step is sufficient to control the corresponding guidance error. In particular, probabilistic classifiers achieving conditional KL divergence $\varepsilon^2$ induce guidance vectors with mean squared error $\widetilde O(d \varepsilon )$, up to constant and logarithmic factors. Our result yields an upper bound on the sampling error of classifier-guided diffusion models and bears resemblance to a reverse log-Sobolev--type inequality. To the best of our knowledge, this is the first result that quantitatively links classifier training to guidance alignment in diffusion models, providing both a theoretical explanation for the empirical success of classifier guidance, and principled guidelines for selecting classifiers that induce effective guidance.
Abstract:Classifier-guided diffusion models generate conditional samples by augmenting the reverse-time score with the gradient of a learned classifier, yet it remains unclear whether standard classifier training procedures yield effective diffusion guidance. We address this gap by showing that, under mild smoothness assumptions on the classifiers, controlling the cross-entropy error at each diffusion step also controls the error of the resulting guidance vectors: classifiers achieving conditional KL divergence $\varepsilon^2$ from the ground-truth conditional label probabilities induce guidance vectors with mean squared error $\widetilde{O}(d \varepsilon )$. Our result yields an upper bound on the sampling error under classifier guidance and bears resemblance to a reverse log-Sobolev-type inequality. Moreover, we show that the classifier smoothness assumption is essential, by constructing simple counterexamples demonstrating that, without it, control of the guidance vector can fail for almost all distributions. To our knowledge, our work establishes the first quantitative link between classifier training and guidance alignment, yielding both a theoretical foundation for classifier guidance and principled guidelines for classifier selection.
Abstract:While momentum-based acceleration has been studied extensively in deterministic optimization problems, its behavior in nonstationary environments -- where the data distribution and optimal parameters drift over time -- remains underexplored. We analyze the tracking performance of Stochastic Gradient Descent (SGD) and its momentum variants (Polyak heavy-ball and Nesterov) under uniform strong convexity and smoothness in varying stepsize regimes. We derive finite-time bounds in expectation and with high probability for the tracking error, establishing a sharp decomposition into three components: a transient initialization term, a noise-induced variance term, and a drift-induced tracking lag. Crucially, our analysis uncovers a fundamental trade-off: while momentum can suppress gradient noise, it incurs an explicit penalty on the tracking capability. We show that momentum can substantially amplify drift-induced tracking error, with amplification that becomes unbounded as the momentum parameter approaches one, formalizing the intuition that using 'stale' gradients hinders adaptation to rapid regime shifts. Complementing these upper bounds, we establish minimax lower bounds for dynamic regret under gradient-variation constraints. These lower bounds prove that the inertia-induced penalty is not an artifact of analysis but an information-theoretic barrier: in drift-dominated regimes, momentum creates an unavoidable 'inertia window' that fundamentally degrades performance. Collectively, these results provide a definitive theoretical grounding for the empirical instability of momentum in dynamic environments and delineate the precise regime boundaries where SGD provably outperforms its accelerated counterparts.
Abstract:We study regret minimization under privacy constraints in episodic inhomogeneous linear Markov Decision Processes (MDPs), motivated by the growing use of reinforcement learning (RL) in personalized decision-making systems that rely on sensitive user data. In this setting, both transition probabilities and reward functions are assumed to be linear in a feature mapping $\phi(s, a)$, and we aim to ensure privacy through joint differential privacy (JDP), a relaxation of differential privacy suited to online learning. Prior work has established suboptimal regret bounds by privatizing the LSVI-UCB algorithm, which achieves $\widetilde{O}(\sqrt{d^3 H^4 K})$ regret in the non-private setting. Building on recent advances that improve this to minimax optimal regret $\widetilde{O}(HD\sqrt{K})$ via LSVI-UCB++ with Bernstein-style bonuses, we design a new differentially private algorithm by privatizing LSVI-UCB++ and adapting techniques for variance-aware analysis from offline RL. Our algorithm achieves a regret bound of $\widetilde{O}(d \sqrt{H^3 K} + H^{4.5} d^{7/6} K^{1/2} / \epsilon)$, improving over previous private methods. Empirical results show that our algorithm retains near-optimal utility compared to non-private baselines, indicating that privacy can be achieved with minimal performance degradation in this setting.