Picture for Dingyan Shang

Dingyan Shang

When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

Add code
May 27, 2026
Viaarxiv icon