Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Enhancing Q-Value Updates in Deep Q-Learning via Successor-State Prediction

Nov 05, 2025

Lipeng Zu, Hansong Zhou, Xiaonan Zhang

Figure 1 for Enhancing Q-Value Updates in Deep Q-Learning via Successor-State Prediction

Figure 2 for Enhancing Q-Value Updates in Deep Q-Learning via Successor-State Prediction

Figure 3 for Enhancing Q-Value Updates in Deep Q-Learning via Successor-State Prediction

Figure 4 for Enhancing Q-Value Updates in Deep Q-Learning via Successor-State Prediction

Share this with someone who'll enjoy it:

Abstract:Deep Q-Networks (DQNs) estimate future returns by learning from transitions sampled from a replay buffer. However, the target updates in DQN often rely on next states generated by actions from past, potentially suboptimal, policy. As a result, these states may not provide informative learning signals, causing high variance into the update process. This issue is exacerbated when the sampled transitions are poorly aligned with the agent's current policy. To address this limitation, we propose the Successor-state Aggregation Deep Q-Network (SADQ), which explicitly models environment dynamics using a stochastic transition model. SADQ integrates successor-state distributions into the Q-value estimation process, enabling more stable and policy-aligned value updates. Additionally, it explores a more efficient action selection strategy with the modeled transition structure. We provide theoretical guarantees that SADQ maintains unbiased value estimates while reducing training variance. Our extensive empirical results across standard RL benchmarks and real-world vector-based control tasks demonstrate that SADQ consistently outperforms DQN variants in both stability and learning efficiency.

View paper on

Share this with someone who'll enjoy it:

Title:Enhancing Q-Value Updates in Deep Q-Learning via Successor-State Prediction

Paper and Code