Picture for Guanjun Jiang

Guanjun Jiang

MARCH: Multi-Agent Reinforced Self-Check for LLM Hallucination

Add code
Mar 25, 2026
Viaarxiv icon

Rationale Matters: Learning Transferable Rubrics via Proxy-Guided Critique for VLMReward Models

Add code
Mar 17, 2026
Viaarxiv icon

Grounding the Score: Explicit Visual Premise Verification for Reliable Vision-Language Process Reward Models

Add code
Mar 17, 2026
Viaarxiv icon

QuarkMedBench: A Real-World Scenario Driven Benchmark for Evaluating Large Language Models

Add code
Mar 14, 2026
Viaarxiv icon

CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR

Add code
Mar 10, 2026
Viaarxiv icon

Open Rubric System: Scaling Reinforcement Learning with Pairwise Adaptive Rubric

Add code
Feb 15, 2026
Viaarxiv icon

Quark Medical Alignment: A Holistic Multi-Dimensional Alignment and Collaborative Optimization Paradigm

Add code
Feb 12, 2026
Viaarxiv icon

Answer First, Reason Later: Aligning Search Relevance via Mode-Balanced Reinforcement Learning

Add code
Feb 10, 2026
Viaarxiv icon

SiameseNorm: Breaking the Barrier to Reconciling Pre/Post-Norm

Add code
Feb 08, 2026
Viaarxiv icon

ETR: Outcome-Guided Elastic Trust Regions for Policy Optimization

Add code
Jan 07, 2026
Viaarxiv icon