Picture for Xiong Jun Wu

Xiong Jun Wu

Missing Old Logits in Asynchronous Agentic RL: Semantic Mismatch and Repair Methods for Off-Policy Correction

Add code
May 12, 2026
Viaarxiv icon

SHARP: Synthesizing High-quality Aligned Reasoning Problems for Large Reasoning Models Reinforcement Learning

Add code
May 21, 2025
Viaarxiv icon