Picture for Xiaohan Wang

Xiaohan Wang

Sue

Joint Training of Multi-Token Prediction in Reinforcement Learning via Optimal Coefficient Calibration

Add code
May 27, 2026
Viaarxiv icon

ZipRL: Adaptive Multi-Turn Context Compression with Hindsight Response Replay

Add code
May 27, 2026
Viaarxiv icon

On the Hidden Costs of Counterfactual Knowledge Training in LLM Unlearning

Add code
May 26, 2026
Viaarxiv icon

LiveK12Bench: Have Large Multimodal Models Truly Conquered High School-level Examinations?

Add code
May 26, 2026
Viaarxiv icon

When Self-Belief Misleads: Active Label Acquisition for Reinforcement Learning with Verifiable Rewards

Add code
May 25, 2026
Viaarxiv icon

Implicit Hierarchical GRPO: Decoupling Tool Invocation from Execution for Tool-Integrated Mathematical Reasoning

Add code
May 18, 2026
Viaarxiv icon

AMR-SD: Asymmetric Meta-Reflective Self-Distillation for Token-Level Credit Assignment

Add code
May 18, 2026
Viaarxiv icon

RecRM-Bench: Benchmarking Multidimensional Reward Modeling for Agentic Recommender Systems

Add code
May 12, 2026
Viaarxiv icon

Full-Spectrum Graph Neural Network: Expressive and Scalable

Add code
May 07, 2026
Viaarxiv icon

Uncovering Entity Identity Confusion in Multimodal Knowledge Editing

Add code
May 07, 2026
Viaarxiv icon