Picture for Yibiao Chen

Yibiao Chen

SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks

Add code
Apr 10, 2026
Viaarxiv icon

Anchored Policy Optimization: Mitigating Exploration Collapse Via Support-Constrained Rectification

Add code
Feb 05, 2026
Viaarxiv icon