Picture for Wen Sun

Wen Sun

Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization

Add code
Jul 18, 2024
Figure 1 for Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
Figure 2 for Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
Figure 3 for Correcting the Mythos of KL-Regularization: Direct Alignment without Overparameterization via Chi-squared Preference Optimization
Viaarxiv icon

On Speeding Up Language Model Evaluation

Add code
Jul 08, 2024
Viaarxiv icon

Orchestrating LLMs with Different Personalizations

Add code
Jul 04, 2024
Figure 1 for Orchestrating LLMs with Different Personalizations
Figure 2 for Orchestrating LLMs with Different Personalizations
Figure 3 for Orchestrating LLMs with Different Personalizations
Figure 4 for Orchestrating LLMs with Different Personalizations
Viaarxiv icon

Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics

Add code
Jun 17, 2024
Figure 1 for Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics
Viaarxiv icon

Understanding Preference Fine-Tuning Through the Lens of Coverage

Add code
Jun 03, 2024
Figure 1 for Understanding Preference Fine-Tuning Through the Lens of Coverage
Figure 2 for Understanding Preference Fine-Tuning Through the Lens of Coverage
Figure 3 for Understanding Preference Fine-Tuning Through the Lens of Coverage
Figure 4 for Understanding Preference Fine-Tuning Through the Lens of Coverage
Viaarxiv icon

REBEL: Reinforcement Learning via Regressing Relative Rewards

Add code
Apr 25, 2024
Viaarxiv icon

Dataset Reset Policy Optimization for RLHF

Add code
Apr 15, 2024
Figure 1 for Dataset Reset Policy Optimization for RLHF
Figure 2 for Dataset Reset Policy Optimization for RLHF
Figure 3 for Dataset Reset Policy Optimization for RLHF
Figure 4 for Dataset Reset Policy Optimization for RLHF
Viaarxiv icon

Adversarial Imitation Learning via Boosting

Add code
Apr 12, 2024
Figure 1 for Adversarial Imitation Learning via Boosting
Figure 2 for Adversarial Imitation Learning via Boosting
Figure 3 for Adversarial Imitation Learning via Boosting
Figure 4 for Adversarial Imitation Learning via Boosting
Viaarxiv icon

Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes

Add code
Mar 29, 2024
Viaarxiv icon

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

Add code
Mar 25, 2024
Figure 1 for RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Figure 2 for RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Figure 3 for RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Figure 4 for RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Viaarxiv icon