Picture for Lantao Mei

Lantao Mei

$f$-GRPO and Beyond: Divergence-Based Reinforcement Learning Algorithms for General LLM Alignment

Add code
Feb 05, 2026
Viaarxiv icon