Picture for Yangou Ouyang

Yangou Ouyang

Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration

Add code
Jan 12, 2026
Viaarxiv icon

MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization

Add code
Jan 12, 2026
Viaarxiv icon