Picture for Jiakang Wang

Jiakang Wang

Robust Tool Use via Fission-GRPO: Learning to Recover from Execution Errors

Add code
Jan 22, 2026
Viaarxiv icon

Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models

Add code
Sep 30, 2025
Viaarxiv icon