Picture for Wen Sun

Wen Sun

Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization

Add code
Jul 18, 2024
Figure 1 for Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
Figure 2 for Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
Figure 3 for Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
Viaarxiv icon

On Speeding Up Language Model Evaluation

Add code
Jul 08, 2024
Viaarxiv icon

Orchestrating LLMs with Different Personalizations

Add code
Jul 04, 2024
Viaarxiv icon

Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics

Add code
Jun 17, 2024
Viaarxiv icon

Understanding Preference Fine-Tuning Through the Lens of Coverage

Add code
Jun 03, 2024
Viaarxiv icon

REBEL: Reinforcement Learning via Regressing Relative Rewards

Add code
Apr 25, 2024
Figure 1 for REBEL: Reinforcement Learning via Regressing Relative Rewards
Figure 2 for REBEL: Reinforcement Learning via Regressing Relative Rewards
Figure 3 for REBEL: Reinforcement Learning via Regressing Relative Rewards
Figure 4 for REBEL: Reinforcement Learning via Regressing Relative Rewards
Viaarxiv icon

Dataset Reset Policy Optimization for RLHF

Add code
Apr 15, 2024
Figure 1 for Dataset Reset Policy Optimization for RLHF
Figure 2 for Dataset Reset Policy Optimization for RLHF
Figure 3 for Dataset Reset Policy Optimization for RLHF
Figure 4 for Dataset Reset Policy Optimization for RLHF
Viaarxiv icon

Adversarial Imitation Learning via Boosting

Add code
Apr 12, 2024
Figure 1 for Adversarial Imitation Learning via Boosting
Figure 2 for Adversarial Imitation Learning via Boosting
Figure 3 for Adversarial Imitation Learning via Boosting
Figure 4 for Adversarial Imitation Learning via Boosting
Viaarxiv icon

Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes

Add code
Mar 29, 2024
Figure 1 for Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes
Viaarxiv icon

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

Add code
Mar 25, 2024
Viaarxiv icon