Picture for Jason R Brown

Jason R Brown

KL-Regularised Q-Learning: A Token-level Action-Value perspective on Online RLHF

Add code
Aug 23, 2025
Figure 1 for KL-Regularised Q-Learning: A Token-level Action-Value perspective on Online RLHF
Figure 2 for KL-Regularised Q-Learning: A Token-level Action-Value perspective on Online RLHF
Figure 3 for KL-Regularised Q-Learning: A Token-level Action-Value perspective on Online RLHF
Figure 4 for KL-Regularised Q-Learning: A Token-level Action-Value perspective on Online RLHF
Viaarxiv icon