Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Konstantin Kulikov

Few-Shot Synthetic Data Generation with Diffusion Models for Downstream Vision Tasks

May 12, 2026

Daniil Dushenev, Nazariy Karpov, Daniil Zinovjev, Alexander Gorin, Konstantin Kulikov

Abstract:Class imbalance is a persistent challenge in visual recognition, particularly in safety-critical domains where collecting positive examples is expensive and rare events are inherently underrepresented. We propose a lightweight synthetic data augmentation pipeline that fine-tunes a LoRA adapter on as few as 20-50 real images of a rare class and uses a pretrained diffusion model to generate synthetic samples for training. We systematically vary the synthetic-to-real ratio and evaluate the approach across two structurally different domains: chest X-ray pathology classification (NIH ChestX-ray14) and industrial surface crack detection (Magnetic Tile Defect dataset). All evaluations are performed on held-out sets of real images only. Across both domains, synthetic augmentation consistently improves rare-class recall and F1 compared to training with real data alone. Performance improves with moderate synthetic augmentation and shows diminishing returns as the synthetic ratio increases. These results suggest that LoRA-adapted diffusion models provide a simple and scalable mechanism for augmenting rare classes, enabling effective learning in data-scarce scenarios across heterogeneous visual domains.

* 5 pages, 3 figures, 1 table. Accepted at SynData4CV Workshop @ CVPR 2026

Via

Access Paper or Ask Questions

Kandinsky 3: Text-to-Image Synthesis for Multifunctional Generative Framework

Oct 28, 2024

Vladimir Arkhipkin, Viacheslav Vasilev, Andrei Filatov, Igor Pavlov, Julia Agafonova, Nikolai Gerasimenko, Anna Averchenkova, Evelina Mironova, Anton Bukashkin, Konstantin Kulikov(+2 more)

Figure 1 for Kandinsky 3: Text-to-Image Synthesis for Multifunctional Generative Framework

Figure 2 for Kandinsky 3: Text-to-Image Synthesis for Multifunctional Generative Framework

Figure 3 for Kandinsky 3: Text-to-Image Synthesis for Multifunctional Generative Framework

Figure 4 for Kandinsky 3: Text-to-Image Synthesis for Multifunctional Generative Framework

Abstract:Text-to-image (T2I) diffusion models are popular for introducing image manipulation methods, such as editing, image fusion, inpainting, etc. At the same time, image-to-video (I2V) and text-to-video (T2V) models are also built on top of T2I models. We present Kandinsky 3, a novel T2I model based on latent diffusion, achieving a high level of quality and photorealism. The key feature of the new architecture is the simplicity and efficiency of its adaptation for many types of generation tasks. We extend the base T2I model for various applications and create a multifunctional generation system that includes text-guided inpainting/outpainting, image fusion, text-image fusion, image variations generation, I2V and T2V generation. We also present a distilled version of the T2I model, evaluating inference in 4 steps of the reverse process without reducing image quality and 3 times faster than the base model. We deployed a user-friendly demo system in which all the features can be tested in the public domain. Additionally, we released the source code and checkpoints for the Kandinsky 3 and extended models. Human evaluations show that Kandinsky 3 demonstrates one of the highest quality scores among open source generation systems.

* Accepted for EMNLP 2024 (Demo track)

Via

Access Paper or Ask Questions