Picture for Weijie Liu

Weijie Liu

Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation

Add code
Feb 12, 2026
Viaarxiv icon

Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models

Add code
Feb 12, 2026
Viaarxiv icon

Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models

Add code
Feb 02, 2026
Viaarxiv icon

ORBIT: On-policy Exploration-Exploitation for Controllable Multi-Budget Reasoning

Add code
Jan 13, 2026
Viaarxiv icon

VI-MMRec: Similarity-Aware Training Cost-free Virtual User-Item Interactions for Multimodal Recommendation

Add code
Dec 09, 2025
Figure 1 for VI-MMRec: Similarity-Aware Training Cost-free Virtual User-Item Interactions for Multimodal Recommendation
Figure 2 for VI-MMRec: Similarity-Aware Training Cost-free Virtual User-Item Interactions for Multimodal Recommendation
Figure 3 for VI-MMRec: Similarity-Aware Training Cost-free Virtual User-Item Interactions for Multimodal Recommendation
Figure 4 for VI-MMRec: Similarity-Aware Training Cost-free Virtual User-Item Interactions for Multimodal Recommendation
Viaarxiv icon

EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control

Add code
Nov 19, 2025
Viaarxiv icon

Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error

Add code
Oct 30, 2025
Viaarxiv icon

Think Outside the Policy: In-Context Steered Policy Optimization

Add code
Oct 30, 2025
Viaarxiv icon

Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought

Add code
May 21, 2025
Viaarxiv icon

TACO: Tackling Over-correction in Federated Learning with Tailored Adaptive Correction

Add code
Apr 24, 2025
Viaarxiv icon