Picture for Yurun Yuan

Yurun Yuan

Reinforce LLM Reasoning through Multi-Agent Reflection

Add code
Jun 10, 2025
Viaarxiv icon

Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning

Add code
May 21, 2025
Viaarxiv icon