Abstract:Datasets in engineering domains are often small, sparsely labeled, and contain numerical as well as categorical conditions. Additionally. computational resources are typically limited in practical applications which hinders the adoption of generative models for engineering tasks. We introduce a novel masked-conditioning approach, that enables generative models to work with sparse, mixed-type data. We mask conditions during training to simulate sparse conditions at inference time. For this purpose, we explore the use of various sparsity schedules that show different strengths and weaknesses. In addition, we introduce a flexible embedding that deals with categorical as well as numerical conditions. We integrate our method into an efficient variational autoencoder as well as a latent diffusion model and demonstrate the applicability of our approach on two engineering-related datasets of 2D point clouds and images. Finally, we show that small models trained on limited data can be coupled with large pretrained foundation models to improve generation quality while retaining the controllability induced by our conditioning scheme.
Abstract:Recent advancements in image synthesis are fueled by the advent of large-scale diffusion models. Yet, integrating realistic object visualizations seamlessly into new or existing backgrounds without extensive training remains a challenge. This paper introduces InsertDiffusion, a novel, training-free diffusion architecture that efficiently embeds objects into images while preserving their structural and identity characteristics. Our approach utilizes off-the-shelf generative models and eliminates the need for fine-tuning, making it ideal for rapid and adaptable visualizations in product design and marketing. We demonstrate superior performance over existing methods in terms of image realism and alignment with input conditions. By decomposing the generation task into independent steps, InsertDiffusion offers a scalable solution that extends the capabilities of diffusion models for practical applications, achieving high-quality visualizations that maintain the authenticity of the original objects.