Picture for Xiaoye Qu

Xiaoye Qu

A Survey of Reinforcement Learning for Large Reasoning Models

Add code
Sep 10, 2025
Viaarxiv icon

SafeWork-R1: Coevolving Safety and Intelligence under the AI-45$^{\circ}$ Law

Add code
Jul 24, 2025
Viaarxiv icon

Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Add code
Jun 04, 2025
Viaarxiv icon

Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning

Add code
May 26, 2025
Viaarxiv icon

Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models

Add code
May 26, 2025
Viaarxiv icon

SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards

Add code
May 25, 2025
Viaarxiv icon

Step-level Reward for Free in RL-based T2I Diffusion Model Fine-tuning

Add code
May 25, 2025
Viaarxiv icon

Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models

Add code
May 20, 2025
Viaarxiv icon

OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning

Add code
May 13, 2025
Figure 1 for OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
Figure 2 for OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
Figure 3 for OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
Figure 4 for OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
Viaarxiv icon

Learning to Reason under Off-Policy Guidance

Add code
Apr 22, 2025
Viaarxiv icon