Picture for Yaojie Lu

Yaojie Lu

Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination

Add code
May 29, 2026
Viaarxiv icon

Your Teacher Can't Help You Here: Combating Supervision Fidelity Decay in On-Policy Distillation

Add code
May 29, 2026
Viaarxiv icon

LiteCoder-Terminal: Scaling Long-Horizon Terminal Environments for Learning Language Agents

Add code
May 28, 2026
Viaarxiv icon

MetaphorVU: Towards Metaphorical Video Understanding

Add code
May 25, 2026
Viaarxiv icon

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

Add code
May 19, 2026
Viaarxiv icon

Learning from Failures: Correction-Oriented Policy Optimization with Verifiable Rewards

Add code
May 14, 2026
Viaarxiv icon

All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG

Add code
Apr 22, 2026
Viaarxiv icon

Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces

Add code
Apr 09, 2026
Viaarxiv icon

P^2O: Joint Policy and Prompt Optimization

Add code
Mar 23, 2026
Viaarxiv icon

Tackling Length Inflation Without Trade-offs: Group Relative Reward Rescaling for Reinforcement Learning

Add code
Mar 11, 2026
Viaarxiv icon