Picture for Ganqu Cui

Ganqu Cui

MiniCPM4: Ultra-Efficient LLMs on End Devices

Add code
Jun 09, 2025
Viaarxiv icon

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Add code
May 28, 2025
Viaarxiv icon

Bridging Supervised Learning and Reinforcement Learning in Math Reasoning

Add code
May 23, 2025
Viaarxiv icon

TTRL: Test-Time Reinforcement Learning

Add code
Apr 22, 2025
Viaarxiv icon

Learning to Reason under Off-Policy Guidance

Add code
Apr 22, 2025
Viaarxiv icon

AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset

Add code
Apr 04, 2025
Viaarxiv icon

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

Add code
Mar 27, 2025
Viaarxiv icon

UltraIF: Advancing Instruction Following from the Wild

Add code
Feb 06, 2025
Figure 1 for UltraIF: Advancing Instruction Following from the Wild
Figure 2 for UltraIF: Advancing Instruction Following from the Wild
Figure 3 for UltraIF: Advancing Instruction Following from the Wild
Figure 4 for UltraIF: Advancing Instruction Following from the Wild
Viaarxiv icon

Process Reinforcement through Implicit Rewards

Add code
Feb 03, 2025
Viaarxiv icon

From Drafts to Answers: Unlocking LLM Potential via Aggregation Fine-Tuning

Add code
Jan 21, 2025
Viaarxiv icon