Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization

Apr 20, 2025

Liang Peng, Boxi Wu, Haoran Cheng, Yibo Zhao, Xiaofei He

Figure 1 for SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization

Figure 2 for SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization

Figure 3 for SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization

Figure 4 for SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization

Share this with someone who'll enjoy it:

Abstract:Previous text-to-image diffusion models typically employ supervised fine-tuning (SFT) to enhance pre-trained base models. However, this approach primarily minimizes the loss of mean squared error (MSE) at the pixel level, neglecting the need for global optimization at the image level, which is crucial for achieving high perceptual quality and structural coherence. In this paper, we introduce Self-sUpervised Direct preference Optimization (SUDO), a novel paradigm that optimizes both fine-grained details at the pixel level and global image quality. By integrating direct preference optimization into the model, SUDO generates preference image pairs in a self-supervised manner, enabling the model to prioritize global-level learning while complementing the pixel-level MSE loss. As an effective alternative to supervised fine-tuning, SUDO can be seamlessly applied to any text-to-image diffusion model. Importantly, it eliminates the need for costly data collection and annotation efforts typically associated with traditional direct preference optimization methods. Through extensive experiments on widely-used models, including Stable Diffusion 1.5 and XL, we demonstrate that SUDO significantly enhances both global and local image quality. The codes are provided at \href{https://github.com/SPengLiang/SUDO}{this link}.

View paper on

Share this with someone who'll enjoy it:

Title:SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization

Paper and Code