Picture for Xianpei Han

Xianpei Han

P^2O: Joint Policy and Prompt Optimization

Add code
Mar 23, 2026
Viaarxiv icon

Tackling Length Inflation Without Trade-offs: Group Relative Reward Rescaling for Reinforcement Learning

Add code
Mar 11, 2026
Viaarxiv icon

Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards

Add code
Mar 10, 2026
Viaarxiv icon

DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation

Add code
Feb 26, 2026
Viaarxiv icon

Coupled Variational Reinforcement Learning for Language Model General Reasoning

Add code
Dec 14, 2025
Viaarxiv icon

AI-Salesman: Towards Reliable Large Language Model Driven Telemarketing

Add code
Nov 15, 2025
Viaarxiv icon

RMTBench: Benchmarking LLMs Through Multi-Turn User-Centric Role-Playing

Add code
Jul 27, 2025
Viaarxiv icon

ARise: Towards Knowledge-Augmented Reasoning via Risk-Adaptive Search

Add code
Apr 15, 2025
Figure 1 for ARise: Towards Knowledge-Augmented Reasoning via Risk-Adaptive Search
Figure 2 for ARise: Towards Knowledge-Augmented Reasoning via Risk-Adaptive Search
Figure 3 for ARise: Towards Knowledge-Augmented Reasoning via Risk-Adaptive Search
Figure 4 for ARise: Towards Knowledge-Augmented Reasoning via Risk-Adaptive Search
Viaarxiv icon

Memorizing is Not Enough: Deep Knowledge Injection Through Reasoning

Add code
Apr 01, 2025
Figure 1 for Memorizing is Not Enough: Deep Knowledge Injection Through Reasoning
Figure 2 for Memorizing is Not Enough: Deep Knowledge Injection Through Reasoning
Figure 3 for Memorizing is Not Enough: Deep Knowledge Injection Through Reasoning
Figure 4 for Memorizing is Not Enough: Deep Knowledge Injection Through Reasoning
Viaarxiv icon

ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers

Add code
Apr 01, 2025
Viaarxiv icon