Picture for Rujun Guo

Rujun Guo

Answer First, Reason Later: Aligning Search Relevance via Mode-Balanced Reinforcement Learning

Add code
Feb 10, 2026
Viaarxiv icon

ETR: Outcome-Guided Elastic Trust Regions for Policy Optimization

Add code
Jan 07, 2026
Viaarxiv icon