Picture for Chao Xin

Chao Xin

PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier

Add code
Jun 12, 2025
Viaarxiv icon

A Unified Pairwise Framework for RLHF: Bridging Generative Reward Modeling and Policy Optimization

Add code
Apr 07, 2025
Viaarxiv icon

Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

Add code
Mar 31, 2025
Viaarxiv icon