Picture for Wayne Xin Zhao

Wayne Xin Zhao

From Trial-and-Error to Improvement: A Systematic Analysis of LLM Exploration Mechanisms in RLVR

Add code
Aug 11, 2025
Viaarxiv icon

BEE-RAG: Balanced Entropy Engineering for Retrieval-Augmented Generation

Add code
Aug 07, 2025
Viaarxiv icon

WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training

Add code
Jul 23, 2025
Viaarxiv icon

Enhancing Cross-task Transfer of Large Language Models via Activation Steering

Add code
Jul 17, 2025
Viaarxiv icon

AVC-DPO: Aligned Video Captioning via Direct Preference Optimization

Add code
Jul 02, 2025
Viaarxiv icon

Reasoning with Exploration: An Entropy Perspective

Add code
Jun 17, 2025
Viaarxiv icon

ICPC-Eval: Probing the Frontiers of LLM Reasoning with Competitive Programming Contests

Add code
Jun 05, 2025
Viaarxiv icon

Towards Effective Code-Integrated Reasoning

Add code
May 30, 2025
Viaarxiv icon

Reinforced Informativeness Optimization for Long-Form Retrieval-Augmented Generation

Add code
May 27, 2025
Viaarxiv icon

MMATH: A Multilingual Benchmark for Mathematical Reasoning

Add code
May 25, 2025
Viaarxiv icon