Picture for Zhengqing Yan

Zhengqing Yan

GDEPO: Group Dual-dynamic and Equal-right-advantage Policy Optimization with Enhanced Training Data Utilization for Sample-Constrained Reinforcement Learning

Add code
Jan 11, 2026
Viaarxiv icon