Picture for Xiaojun Quan

Xiaojun Quan

When Model Merging Breaks Routing: Training-Free Calibration for MoE

Add code
Jun 02, 2026
Viaarxiv icon

Unleashing Implicit Rewards: Prefix-Value Learning for Distribution-Level Optimization

Add code
Apr 14, 2026
Viaarxiv icon

Stabilizing Policy Optimization via Logits Convexity

Add code
Mar 01, 2026
Viaarxiv icon

ProactiveEval: A Unified Evaluation Framework for Proactive Dialogue Agents

Add code
Aug 28, 2025
Viaarxiv icon

Discriminative Policy Optimization for Token-Level Reward Models

Add code
May 29, 2025
Viaarxiv icon

ThinkSwitcher: When to Think Hard, When to Think Fast

Add code
May 20, 2025
Figure 1 for ThinkSwitcher: When to Think Hard, When to Think Fast
Figure 2 for ThinkSwitcher: When to Think Hard, When to Think Fast
Figure 3 for ThinkSwitcher: When to Think Hard, When to Think Fast
Figure 4 for ThinkSwitcher: When to Think Hard, When to Think Fast
Viaarxiv icon

FuseRL: Dense Preference Optimization for Heterogeneous Model Fusion

Add code
Apr 09, 2025
Viaarxiv icon

FuseChat-3.0: Preference Optimization Meets Heterogeneous Model Fusion

Add code
Mar 06, 2025
Viaarxiv icon

Advantage-Guided Distillation for Preference Alignment in Small Language Models

Add code
Feb 25, 2025
Viaarxiv icon

PsyPlay: Personality-Infused Role-Playing Conversational Agents

Add code
Feb 06, 2025
Viaarxiv icon