Picture for Mengyu Zhou

Mengyu Zhou

GD$^2$PO: Mitigating Multi-Reward Conflicts via Group-Dynamic reward-Decoupled Policy Optimization

Add code
Jun 15, 2026
Viaarxiv icon

Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill

Add code
Jun 02, 2026
Viaarxiv icon

Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models

Add code
May 12, 2026
Viaarxiv icon

ATP-Bench: Towards Agentic Tool Planning for MLLM Interleaved Generation

Add code
Mar 31, 2026
Viaarxiv icon

Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

Add code
Mar 26, 2026
Viaarxiv icon

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

Add code
Mar 25, 2026
Viaarxiv icon

Grounding the Score: Explicit Visual Premise Verification for Reliable Vision-Language Process Reward Models

Add code
Mar 17, 2026
Viaarxiv icon

Rationale Matters: Learning Transferable Rubrics via Proxy-Guided Critique for VLMReward Models

Add code
Mar 17, 2026
Viaarxiv icon

Open Rubric System: Scaling Reinforcement Learning with Pairwise Adaptive Rubric

Add code
Feb 15, 2026
Viaarxiv icon

SiameseNorm: Breaking the Barrier to Reconciling Pre/Post-Norm

Add code
Feb 08, 2026
Viaarxiv icon