Picture for Yingbin Liang

Yingbin Liang

Reinforcement Learning from Multi-Source Imperfect Preferences: Best-of-Both-Regimes Regret

Add code
Mar 20, 2026
Viaarxiv icon

HIPO: Instruction Hierarchy via Constrained Reinforcement Learning

Add code
Mar 17, 2026
Viaarxiv icon

Why Adam Can Beat SGD: Second-Moment Normalization Yields Sharper Tails

Add code
Mar 03, 2026
Viaarxiv icon

Sharp Convergence Rates for Masked Diffusion Models

Add code
Feb 26, 2026
Viaarxiv icon

On the Learning Dynamics of RLVR at the Edge of Competence

Add code
Feb 16, 2026
Viaarxiv icon

Constraint-Rectified Training for Efficient Chain-of-Thought

Add code
Feb 13, 2026
Viaarxiv icon

Reward Modeling for Reinforcement Learning-Based LLM Reasoning: Design, Challenges, and Evaluation

Add code
Feb 10, 2026
Viaarxiv icon

Learnable Chernoff Baselines for Inference-Time Alignment

Add code
Feb 08, 2026
Viaarxiv icon

Bridging Online and Offline RL: Contextual Bandit Learning for Multi-Turn Code Generation

Add code
Feb 03, 2026
Viaarxiv icon

ConvexBench: Can LLMs Recognize Convex Functions?

Add code
Feb 01, 2026
Viaarxiv icon