Abstract:Diffusion models have emerged as the leading approach for style transfer, yet they struggle with photo-realistic transfers, often producing painting-like results or missing detailed stylistic elements. Current methods inadequately address unwanted influence from original content styles and style reference content features. We introduce SCAdapter, a novel technique leveraging CLIP image space to effectively separate and integrate content and style features. Our key innovation systematically extracts pure content from content images and style elements from style references, ensuring authentic transfers. This approach is enhanced through three components: Controllable Style Adaptive Instance Normalization (CSAdaIN) for precise multi-style blending, KVS Injection for targeted style integration, and a style transfer consistency objective maintaining process coherence. Comprehensive experiments demonstrate SCAdapter significantly outperforms state-of-the-art methods in both conventional and diffusion-based baselines. By eliminating DDIM inversion and inference-stage optimization, our method achieves at least $2\times$ faster inference than other diffusion-based approaches, making it both more effective and efficient for practical applications.
Abstract:Diffusion models are emerging as powerful solutions for generating high-fidelity and diverse images, often surpassing GANs under many circumstances. However, their slow inference speed hinders their potential for real-time applications. To address this, DiffusionGAN leveraged a conditional GAN to drastically reduce the denoising steps and speed up inference. Its advancement, Wavelet Diffusion, further accelerated the process by converting data into wavelet space, thus enhancing efficiency. Nonetheless, these models still fall short of GANs in terms of speed and image quality. To bridge these gaps, this paper introduces the Latent Denoising Diffusion GAN, which employs pre-trained autoencoders to compress images into a compact latent space, significantly improving inference speed and image quality. Furthermore, we propose a Weighted Learning strategy to enhance diversity and image quality. Experimental results on the CIFAR-10, CelebA-HQ, and LSUN-Church datasets prove that our model achieves state-of-the-art running speed among diffusion models. Compared to its predecessors, DiffusionGAN and Wavelet Diffusion, our model shows remarkable improvements in all evaluation metrics. Code and pre-trained checkpoints: \url{https://github.com/thanhluantrinh/LDDGAN.git}