Picture for David Yao

David Yao

DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning

Add code
Oct 02, 2025
Figure 1 for DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning
Figure 2 for DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning
Figure 3 for DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning
Figure 4 for DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement Learning
Viaarxiv icon

Mallows-DPO: Fine-Tune Your LLM with Preference Dispersions

Add code
May 23, 2024
Figure 1 for Mallows-DPO: Fine-Tune Your LLM with Preference Dispersions
Figure 2 for Mallows-DPO: Fine-Tune Your LLM with Preference Dispersions
Figure 3 for Mallows-DPO: Fine-Tune Your LLM with Preference Dispersions
Figure 4 for Mallows-DPO: Fine-Tune Your LLM with Preference Dispersions
Viaarxiv icon