Picture for Wenping Hu

Wenping Hu

Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models

Add code
Sep 30, 2025
Viaarxiv icon

Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization

Add code
Aug 12, 2025
Figure 1 for Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
Figure 2 for Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
Figure 3 for Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
Figure 4 for Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
Viaarxiv icon

Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models

Add code
Dec 10, 2024
Viaarxiv icon