Picture for Rafael Rafailov

Rafael Rafailov

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Add code
Jul 05, 2024
Viaarxiv icon

OpenVLA: An Open-Source Vision-Language-Action Model

Add code
Jun 13, 2024
Figure 1 for OpenVLA: An Open-Source Vision-Language-Action Model
Figure 2 for OpenVLA: An Open-Source Vision-Language-Action Model
Figure 3 for OpenVLA: An Open-Source Vision-Language-Action Model
Figure 4 for OpenVLA: An Open-Source Vision-Language-Action Model
Viaarxiv icon

Scalable Ensembling For Mitigating Reward Overoptimisation

Add code
Jun 03, 2024
Figure 1 for Scalable Ensembling For Mitigating Reward Overoptimisation
Figure 2 for Scalable Ensembling For Mitigating Reward Overoptimisation
Figure 3 for Scalable Ensembling For Mitigating Reward Overoptimisation
Figure 4 for Scalable Ensembling For Mitigating Reward Overoptimisation
Viaarxiv icon

Offline Regularised Reinforcement Learning for Large Language Models Alignment

Add code
May 29, 2024
Figure 1 for Offline Regularised Reinforcement Learning for Large Language Models Alignment
Figure 2 for Offline Regularised Reinforcement Learning for Large Language Models Alignment
Figure 3 for Offline Regularised Reinforcement Learning for Large Language Models Alignment
Figure 4 for Offline Regularised Reinforcement Learning for Large Language Models Alignment
Viaarxiv icon

Efficient Imitation Learning with Conservative World Models

Add code
May 21, 2024
Viaarxiv icon

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

Add code
Apr 23, 2024
Figure 1 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 2 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 3 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Figure 4 for Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Viaarxiv icon

Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels

Add code
Apr 22, 2024
Viaarxiv icon

From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function

Add code
Apr 18, 2024
Viaarxiv icon

Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data

Add code
Apr 01, 2024
Viaarxiv icon

Disentangling Length from Quality in Direct Preference Optimization

Add code
Mar 28, 2024
Figure 1 for Disentangling Length from Quality in Direct Preference Optimization
Figure 2 for Disentangling Length from Quality in Direct Preference Optimization
Figure 3 for Disentangling Length from Quality in Direct Preference Optimization
Figure 4 for Disentangling Length from Quality in Direct Preference Optimization
Viaarxiv icon