Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Efficient $Q$-Learning and Actor-Critic Methods for Robust Average Reward Reinforcement Learning

Jun 08, 2025

Yang Xu, Swetha Ganesh, Vaneet Aggarwal

Share this with someone who'll enjoy it:

Abstract:We present the first $Q$-learning and actor-critic algorithms for robust average reward Markov Decision Processes (MDPs) with non-asymptotic convergence under contamination, TV distance and Wasserstein distance uncertainty sets. We show that the robust $Q$ Bellman operator is a strict contractive mapping with respect to a carefully constructed semi-norm with constant functions being quotiented out. This property supports a stochastic approximation update, that learns the optimal robust $Q$ function in $\tilde{\cO}(\epsilon^{-2})$ samples. We also show that the same idea can be used for robust $Q$ function estimation, which can be further used for critic estimation. Coupling it with theories in robust policy mirror descent update, we present a natural actor-critic algorithm that attains an $\epsilon$-optimal robust policy in $\tilde{\cO}(\epsilon^{-3})$ samples. These results advance the theory of distributionally robust reinforcement learning in the average reward setting.

* arXiv admin note: text overlap with arXiv:2502.16816

View paper on

Share this with someone who'll enjoy it:

Title:Efficient $Q$-Learning and Actor-Critic Methods for Robust Average Reward Reinforcement Learning

Paper and Code