Picture for Xiaoyue Ma

Xiaoyue Ma

Resource-Efficient Reinforcement for Reasoning Large Language Models via Dynamic One-Shot Policy Refinement

Add code
Jan 31, 2026
Viaarxiv icon