Picture for Yongmin Kim

Yongmin Kim

On Advantage Estimates for Max@K Policy Gradients

Add code
Jun 04, 2026
Viaarxiv icon

OrderGrad: Optimizing Beyond the Mean with Order-Statistic Policy Gradient Estimation

Add code
Jun 04, 2026
Viaarxiv icon