Picture for Hyungkyu Kang

Hyungkyu Kang

Multi-Step Likelihood-Ratio Correction for Reinforcement Learning with Verifiable Rewards

Add code
May 20, 2026
Viaarxiv icon

Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning

Add code
Mar 07, 2025
Figure 1 for Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
Figure 2 for Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
Figure 3 for Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
Figure 4 for Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
Viaarxiv icon