Picture for Wei Xiong

Wei Xiong

Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts

Add code
Jun 18, 2024
Viaarxiv icon

BeamVQ: Aligning Space-Time Forecasting Model via Self-training on Physics-aware Metrics

Add code
May 27, 2024
Figure 1 for BeamVQ: Aligning Space-Time Forecasting Model via Self-training on Physics-aware Metrics
Figure 2 for BeamVQ: Aligning Space-Time Forecasting Model via Self-training on Physics-aware Metrics
Figure 3 for BeamVQ: Aligning Space-Time Forecasting Model via Self-training on Physics-aware Metrics
Figure 4 for BeamVQ: Aligning Space-Time Forecasting Model via Self-training on Physics-aware Metrics
Viaarxiv icon

RLHF Workflow: From Reward Modeling to Online RLHF

Add code
May 13, 2024
Figure 1 for RLHF Workflow: From Reward Modeling to Online RLHF
Figure 2 for RLHF Workflow: From Reward Modeling to Online RLHF
Figure 3 for RLHF Workflow: From Reward Modeling to Online RLHF
Figure 4 for RLHF Workflow: From Reward Modeling to Online RLHF
Viaarxiv icon

DPO Meets PPO: Reinforced Token Optimization for RLHF

Add code
Apr 29, 2024
Figure 1 for DPO Meets PPO: Reinforced Token Optimization for RLHF
Figure 2 for DPO Meets PPO: Reinforced Token Optimization for RLHF
Figure 3 for DPO Meets PPO: Reinforced Token Optimization for RLHF
Figure 4 for DPO Meets PPO: Reinforced Token Optimization for RLHF
Viaarxiv icon

SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing

Add code
Apr 08, 2024
Figure 1 for SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing
Figure 2 for SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing
Figure 3 for SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing
Figure 4 for SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing
Viaarxiv icon

IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation

Add code
Mar 15, 2024
Figure 1 for IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation
Figure 2 for IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation
Figure 3 for IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation
Figure 4 for IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation
Viaarxiv icon

Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization

Add code
Mar 13, 2024
Figure 1 for Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
Figure 2 for Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
Figure 3 for Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
Figure 4 for Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
Viaarxiv icon

Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards

Add code
Mar 06, 2024
Figure 1 for Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
Figure 2 for Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
Figure 3 for Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
Figure 4 for Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
Viaarxiv icon

Diffusion Model-Based Image Editing: A Survey

Add code
Feb 27, 2024
Figure 1 for Diffusion Model-Based Image Editing: A Survey
Figure 2 for Diffusion Model-Based Image Editing: A Survey
Figure 3 for Diffusion Model-Based Image Editing: A Survey
Figure 4 for Diffusion Model-Based Image Editing: A Survey
Viaarxiv icon

A Theoretical Analysis of Nash Learning from Human Feedback under General KL-Regularized Preference

Add code
Feb 11, 2024
Figure 1 for A Theoretical Analysis of Nash Learning from Human Feedback under General KL-Regularized Preference
Figure 2 for A Theoretical Analysis of Nash Learning from Human Feedback under General KL-Regularized Preference
Figure 3 for A Theoretical Analysis of Nash Learning from Human Feedback under General KL-Regularized Preference
Figure 4 for A Theoretical Analysis of Nash Learning from Human Feedback under General KL-Regularized Preference
Viaarxiv icon