Picture for Muzhi Dai

Muzhi Dai

Stable Reinforcement Learning for Efficient Reasoning

Add code
May 23, 2025
Viaarxiv icon

S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models

Add code
May 12, 2025
Viaarxiv icon

From Captions to Rewards (CAREVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models

Add code
Mar 08, 2025
Viaarxiv icon