Picture for Weijie Liu

Weijie Liu

ADWIN: Adaptive Windows for Horizon-Aware On-Policy Distillation

Add code
May 27, 2026
Viaarxiv icon

RLVR Datasets and Where to Find Them: Tracing Data Lineage for Better Training Data

Add code
May 26, 2026
Viaarxiv icon

Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

Add code
May 13, 2026
Viaarxiv icon

Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex

Add code
May 07, 2026
Viaarxiv icon

Tool Learning Needs Nothing More Than a Free 8B Language Model

Add code
Apr 20, 2026
Viaarxiv icon

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

Add code
Feb 12, 2026
Viaarxiv icon

Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models

Add code
Feb 12, 2026
Viaarxiv icon

Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models

Add code
Feb 02, 2026
Viaarxiv icon

ORBIT: On-policy Exploration-Exploitation for Controllable Multi-Budget Reasoning

Add code
Jan 13, 2026
Viaarxiv icon

VI-MMRec: Similarity-Aware Training Cost-free Virtual User-Item Interactions for Multimodal Recommendation

Add code
Dec 09, 2025
Figure 1 for VI-MMRec: Similarity-Aware Training Cost-free Virtual User-Item Interactions for Multimodal Recommendation
Figure 2 for VI-MMRec: Similarity-Aware Training Cost-free Virtual User-Item Interactions for Multimodal Recommendation
Figure 3 for VI-MMRec: Similarity-Aware Training Cost-free Virtual User-Item Interactions for Multimodal Recommendation
Figure 4 for VI-MMRec: Similarity-Aware Training Cost-free Virtual User-Item Interactions for Multimodal Recommendation
Viaarxiv icon