Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yitao Bai

Finite-Time Analysis of Projected Two-Time-Scale Stochastic Approximation

Mar 31, 2026

Yitao Bai, Thinh T. Doan, Justin Romberg

Abstract:We study the finite-time convergence of projected linear two-time-scale stochastic approximation with constant step sizes and Polyak--Ruppert averaging. We establish an explicit mean-square error bound, decomposing it into two interpretable components, an approximation error determined by the constrained subspace and a statistical error decaying at a sublinear rate, with constants expressed through restricted stability margins and a coupling invertibility condition. These constants cleanly separate the effect of subspace choice (approximation errors) from the effect of the averaging horizon (statistical errors). We illustrate our theoretical results through a number of numerical experiments on both synthetic and reinforcement learning problems.

* 6 pages, 3 figures

Via

Access Paper or Ask Questions

Accelerating Multi-Task Temporal Difference Learning under Low-Rank Representation

Mar 03, 2025

Yitao Bai, Sihan Zeng, Justin Romberg, Thinh T. Doan

Figure 1 for Accelerating Multi-Task Temporal Difference Learning under Low-Rank Representation

Abstract:We study policy evaluation problems in multi-task reinforcement learning (RL) under a low-rank representation setting. In this setting, we are given $N$ learning tasks where the corresponding value function of these tasks lie in an $r$-dimensional subspace, with $r<N$. One can apply the classic temporal-difference (TD) learning method for solving these problems where this method learns the value function of each task independently. In this paper, we are interested in understanding whether one can exploit the low-rank structure of the multi-task setting to accelerate the performance of TD learning. To answer this question, we propose a new variant of TD learning method, where we integrate the so-called truncated singular value decomposition step into the update of TD learning. This additional step will enable TD learning to exploit the dominant directions due to the low rank structure to update the iterates, therefore, improving its performance. Our empirical results show that the proposed method significantly outperforms the classic TD learning, where the performance gap increases as the rank $r$ decreases. From the theoretical point of view, introducing the truncated singular value decomposition step into TD learning might cause an instability on the updates. We provide a theoretical result showing that the instability does not happen. Specifically, we prove that the proposed method converges at a rate $\mathcal{O}(\frac{\ln(t)}{t})$, where $t$ is the number of iterations. This rate matches that of the standard TD learning.

* 13 pages, 3 figures

Via

Access Paper or Ask Questions