Picture for Pengyu Cheng

Pengyu Cheng

PolicyAlign: Direct Policy-Based Safety Alignment for Large Language Models

Add code
Jun 24, 2026
Viaarxiv icon

GD$^2$PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization

Add code
Jun 15, 2026
Viaarxiv icon

F3-Tokenizer: Taming Audio Autoencoder Latents for Understanding and Generation

Add code
Jun 04, 2026
Viaarxiv icon

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

Add code
Jun 02, 2026
Viaarxiv icon

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

Add code
Mar 26, 2026
Viaarxiv icon

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

Add code
Mar 25, 2026
Viaarxiv icon

Borderless Long Speech Synthesis

Add code
Mar 20, 2026
Viaarxiv icon

CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR

Add code
Mar 10, 2026
Viaarxiv icon

Eliminating Inductive Bias in Reward Models with Information-Theoretic Guidance

Add code
Dec 29, 2025
Viaarxiv icon

Search Self-play: Pushing the Frontier of Agent Capability without Supervision

Add code
Oct 21, 2025
Viaarxiv icon