Abstract:Recent breakthroughs in 3D generation have advanced notably with the development of text-to-image diffusion model. However, existing methods remain two practical challenges: (1) They primarily generate single 3D object, but struggle to generate multi-object compositional 3D assets due to the lack of the modeling for Gaussian primitives in reasonable interactions. (2) They often suffer from cross-view inconsistency during 3D optimization, as Score Distillation Sampling inherently performs on each single view, inevitably resulting in cross-view hallucinations. To solve above issues, we propose I2C-3D, a novel optimization-based method to generate multi-view consistent compositional 3D assets with reasonable interactions. Specifically, we propose an Inclusive Interactive Collisions strategy to guide Gaussian primitives appearing in reasonable interaction regions naturally, thereby ensuring objects in the compositional scene interact in a physically plausible and visually coherent way. Additionally, to enhance multi-view consistency, Multi-View Adaptive Score Distillation Sampling is devised to distill multi-view consistency prior and layout prior from pre-trained diffusion model by modulating attention map of instance token and spatial token across viewpoints. Benefiting from above elaborate designs, I2C-3D not only generates high-fidelity multi-view consistent compositional 3D assets but also supports 3D editing flexibly, facilitating complex scene generation. Extensive experiments demonstrate our I2C-3D outperforms existing methods in generation quality and multi-view consistency.
Abstract:Since the training data for the target model in a data-free black-box attack is not available, most recent schemes utilize GANs to generate data for training substitute model. However, these GANs-based schemes suffer from low training efficiency as the generator needs to be retrained for each target model during the substitute training process, as well as low generation quality. To overcome these limitations, we consider utilizing the diffusion model to generate data, and propose a data-free black-box attack scheme based on diffusion model to improve the efficiency and accuracy of substitute training. Despite the data generated by the diffusion model exhibits high quality, it presents diverse domain distributions and contains many samples that do not meet the discriminative criteria of the target model. To further facilitate the diffusion model to generate data suitable for the target model, we propose a Latent Code Augmentation (LCA) method to guide the diffusion model in generating data. With the guidance of LCA, the data generated by the diffusion model not only meets the discriminative criteria of the target model but also exhibits high diversity. By utilizing this data, it is possible to train substitute model that closely resemble the target model more efficiently. Extensive experiments demonstrate that our LCA achieves higher attack success rates and requires fewer query budgets compared to GANs-based schemes for different target models.