Picture for Runze Liu

Runze Liu

Reformulate LLM Reinforcement Learning for Efficient Training under Black-box Discrepancy

Add code
Jun 09, 2026
Viaarxiv icon

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

Add code
Jun 07, 2026
Viaarxiv icon

Local Guidance, Global Impact: Gaussian-Reshaped Trust Region Unlocks Behavior Transitions

Add code
Jun 04, 2026
Viaarxiv icon

Temporal Difference Learning with Constrained Initial Representations

Add code
Feb 12, 2026
Viaarxiv icon

PROF: An LLM-based Reward Code Preference Optimization Framework for Offline Imitation Learning

Add code
Nov 14, 2025
Figure 1 for PROF: An LLM-based Reward Code Preference Optimization Framework for Offline Imitation Learning
Figure 2 for PROF: An LLM-based Reward Code Preference Optimization Framework for Offline Imitation Learning
Figure 3 for PROF: An LLM-based Reward Code Preference Optimization Framework for Offline Imitation Learning
Figure 4 for PROF: An LLM-based Reward Code Preference Optimization Framework for Offline Imitation Learning
Viaarxiv icon

Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models

Add code
Sep 30, 2025
Viaarxiv icon

A Survey of Reinforcement Learning for Large Reasoning Models

Add code
Sep 10, 2025
Viaarxiv icon

ReviewRL: Towards Automated Scientific Review with RL

Add code
Aug 14, 2025
Viaarxiv icon

GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning

Add code
Apr 01, 2025
Viaarxiv icon

VLP: Vision-Language Preference Learning for Embodied Manipulation

Add code
Feb 17, 2025
Figure 1 for VLP: Vision-Language Preference Learning for Embodied Manipulation
Figure 2 for VLP: Vision-Language Preference Learning for Embodied Manipulation
Figure 3 for VLP: Vision-Language Preference Learning for Embodied Manipulation
Figure 4 for VLP: Vision-Language Preference Learning for Embodied Manipulation
Viaarxiv icon