Picture for Xueqi Cheng

Xueqi Cheng

Dynamic Rollout Editing for Reducing Overthinking in RL-Trained Reasoning Models

Add code
Jun 16, 2026
Viaarxiv icon

SkillAudit: Ground-Truth-Free Skill Evolution via Paired Trajectory Auditing

Add code
Jun 12, 2026
Viaarxiv icon

SAW: Stage-Aware Dynamic Weighting for Multi-Objective Reinforcement Learning in Large Language Models

Add code
Jun 05, 2026
Viaarxiv icon

Code-on-Graph: Iterative Programmatic Reasoning via Large Language Models on Knowledge Graphs

Add code
Jun 02, 2026
Viaarxiv icon

Can LLM Rerankers Predict Their Own Ranking Performance?

Add code
Jun 02, 2026
Viaarxiv icon

The Stability of Singular Distribution: A Spectral Perspective on the Two-Phase Dynamics of Language Model Pre-training

Add code
May 26, 2026
Viaarxiv icon

Extra-Merge: Tracing the Rank-1 Subspace of Model Merging in Language Model Pre-Training

Add code
May 26, 2026
Viaarxiv icon

EGAD: Entropy-Guided Adaptive Distillation for Token-Level Knowledge Transfer

Add code
May 03, 2026
Viaarxiv icon

Detoxification for LLM: From Dataset Itself

Add code
Apr 21, 2026
Viaarxiv icon

AdversarialCoT: Single-Document Retrieval Poisoning for LLM Reasoning

Add code
Apr 14, 2026
Viaarxiv icon