Picture for Jason R Brown

Jason R Brown

KL-Regularised Q-Learning: A Token-level Action-Value perspective on Online RLHF

Add code
Aug 23, 2025
Viaarxiv icon