Policy Gradient


Equivalence of stochastic and deterministic policy gradients

Add code
May 29, 2025
Viaarxiv icon

On Global Convergence Rates for Federated Policy Gradient under Heterogeneous Environment

Add code
May 29, 2025
Viaarxiv icon

Enhanced DACER Algorithm with High Diffusion Efficiency

Add code
May 29, 2025
Viaarxiv icon

The challenge of hidden gifts in multi-agent reinforcement learning

Add code
May 29, 2025
Viaarxiv icon

On-Policy RL with Optimal Reward Baseline

Add code
May 29, 2025
Viaarxiv icon

Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners

Add code
May 29, 2025
Viaarxiv icon

Frequency Resource Management in 6G User-Centric CFmMIMO: A Hybrid Reinforcement Learning and Metaheuristic Approach

Add code
May 28, 2025
Viaarxiv icon

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Add code
May 28, 2025
Viaarxiv icon

Text2Grad: Reinforcement Learning from Natural Language Feedback

Add code
May 28, 2025
Viaarxiv icon

SDPO: Importance-Sampled Direct Preference Optimization for Stable Diffusion Training

Add code
May 28, 2025
Viaarxiv icon