Picture for Lizhou Cai

Lizhou Cai

Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex

Add code
May 07, 2026
Viaarxiv icon