Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chongxuan Li

MicroDreamer: Zero-shot 3D Generation in $\sim$20 Seconds by Score-based Iterative Reconstruction

Apr 30, 2024

Luxi Chen, Zhengyi Wang, Chongxuan Li, Tingting Gao, Hang Su, Jun Zhu

$Figure 1 for MicroDreamer: Zero-shot 3D Generation in $\sim$20 Seconds by Score-based Iterative Reconstruction$

$Figure 2 for MicroDreamer: Zero-shot 3D Generation in $\sim$20 Seconds by Score-based Iterative Reconstruction$

$Figure 3 for MicroDreamer: Zero-shot 3D Generation in $\sim$20 Seconds by Score-based Iterative Reconstruction$

$Figure 4 for MicroDreamer: Zero-shot 3D Generation in $\sim$20 Seconds by Score-based Iterative Reconstruction$

Abstract:Optimization-based approaches, such as score distillation sampling (SDS), show promise in zero-shot 3D generation but suffer from low efficiency, primarily due to the high number of function evaluations (NFEs) required for each sample. In this paper, we introduce score-based iterative reconstruction (SIR), an efficient and general algorithm for 3D generation with a multi-view score-based diffusion model. Given the images produced by the diffusion model, SIR reduces NFEs by repeatedly optimizing 3D parameters, unlike the single optimization in SDS, mimicking the 3D reconstruction process. With other improvements including optimization in the pixel space, we present an efficient approach called MicroDreamer that generally applies to various 3D representations and 3D generation tasks. In particular, retaining a comparable performance, MicroDreamer is 5-20 times faster than SDS in generating neural radiance field and takes about 20 seconds to generate meshes from 3D Gaussian splitting on a single A100 GPU, halving the time of the fastest zero-shot baseline, DreamGaussian. Our code is available at https://github.com/ML-GSAI/MicroDreamer.

Via

Access Paper or Ask Questions

Unifying Bayesian Flow Networks and Diffusion Models through Stochastic Differential Equations

Apr 24, 2024

Kaiwen Xue, Yuhao Zhou, Shen Nie, Xu Min, Xiaolu Zhang, Jun Zhou, Chongxuan Li

Figure 1 for Unifying Bayesian Flow Networks and Diffusion Models through Stochastic Differential Equations

Figure 2 for Unifying Bayesian Flow Networks and Diffusion Models through Stochastic Differential Equations

Figure 3 for Unifying Bayesian Flow Networks and Diffusion Models through Stochastic Differential Equations

Figure 4 for Unifying Bayesian Flow Networks and Diffusion Models through Stochastic Differential Equations

Abstract:Bayesian flow networks (BFNs) iteratively refine the parameters, instead of the samples in diffusion models (DMs), of distributions at various noise levels through Bayesian inference. Owing to its differentiable nature, BFNs are promising in modeling both continuous and discrete data, while simultaneously maintaining fast sampling capabilities. This paper aims to understand and enhance BFNs by connecting them with DMs through stochastic differential equations (SDEs). We identify the linear SDEs corresponding to the noise-addition processes in BFNs, demonstrate that BFN's regression losses are aligned with denoise score matching, and validate the sampler in BFN as a first-order solver for the respective reverse-time SDE. Based on these findings and existing recipes of fast sampling in DMs, we propose specialized solvers for BFNs that markedly surpass the original BFN sampler in terms of sample quality with a limited number of function evaluations (e.g., 10) on both image and text datasets. Notably, our best sampler achieves an increase in speed of 5~20 times for free. Our code is available at https://github.com/ML-GSAI/BFN-Solver.

Via

Access Paper or Ask Questions

CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model

Mar 08, 2024

Zhengyi Wang, Yikai Wang, Yifei Chen, Chendong Xiang, Shuo Chen, Dajiang Yu, Chongxuan Li, Hang Su, Jun Zhu

Abstract:Feed-forward 3D generative models like the Large Reconstruction Model (LRM) have demonstrated exceptional generation speed. However, the transformer-based methods do not leverage the geometric priors of the triplane component in their architecture, often leading to sub-optimal quality given the limited size of 3D data and slow training. In this work, we present the Convolutional Reconstruction Model (CRM), a high-fidelity feed-forward single image-to-3D generative model. Recognizing the limitations posed by sparse 3D data, we highlight the necessity of integrating geometric priors into network design. CRM builds on the key observation that the visualization of triplane exhibits spatial correspondence of six orthographic images. First, it generates six orthographic view images from a single input image, then feeds these images into a convolutional U-Net, leveraging its strong pixel-level alignment capabilities and significant bandwidth to create a high-resolution triplane. CRM further employs Flexicubes as geometric representation, facilitating direct end-to-end optimization on textured meshes. Overall, our model delivers a high-fidelity textured mesh from an image in just 10 seconds, without any test-time optimization.

* Project page: https://ml.cs.tsinghua.edu.cn/~zhengyi/CRM/

Via

Access Paper or Ask Questions

Graph Diffusion Policy Optimization

Feb 26, 2024

Yijing Liu, Chao Du, Tianyu Pang, Chongxuan Li, Wei Chen, Min Lin

Figure 1 for Graph Diffusion Policy Optimization

Figure 2 for Graph Diffusion Policy Optimization

Figure 3 for Graph Diffusion Policy Optimization

Figure 4 for Graph Diffusion Policy Optimization

Abstract:Recent research has made significant progress in optimizing diffusion models for specific downstream objectives, which is an important pursuit in fields such as graph generation for drug design. However, directly applying these models to graph diffusion presents challenges, resulting in suboptimal performance. This paper introduces graph diffusion policy optimization (GDPO), a novel approach to optimize graph diffusion models for arbitrary (e.g., non-differentiable) objectives using reinforcement learning. GDPO is based on an eager policy gradient tailored for graph diffusion models, developed through meticulous analysis and promising improved performance. Experimental results show that GDPO achieves state-of-the-art performance in various graph generation tasks with complex and diverse objectives. Code is available at https://github.com/sail-sg/GDPO.

Via

Access Paper or Ask Questions

Gaussian Mixture Solvers for Diffusion Models

Nov 02, 2023

Hanzhong Guo, Cheng Lu, Fan Bao, Tianyu Pang, Shuicheng Yan, Chao Du, Chongxuan Li

Figure 1 for Gaussian Mixture Solvers for Diffusion Models

Figure 2 for Gaussian Mixture Solvers for Diffusion Models

Figure 3 for Gaussian Mixture Solvers for Diffusion Models

Figure 4 for Gaussian Mixture Solvers for Diffusion Models

Abstract:Recently, diffusion models have achieved great success in generative tasks. Sampling from diffusion models is equivalent to solving the reverse diffusion stochastic differential equations (SDEs) or the corresponding probability flow ordinary differential equations (ODEs). In comparison, SDE-based solvers can generate samples of higher quality and are suited for image translation tasks like stroke-based synthesis. During inference, however, existing SDE-based solvers are severely constrained by the efficiency-effectiveness dilemma. Our investigation suggests that this is because the Gaussian assumption in the reverse transition kernel is frequently violated (even in the case of simple mixture data) given a limited number of discretization steps. To overcome this limitation, we introduce a novel class of SDE-based solvers called \emph{Gaussian Mixture Solvers (GMS)} for diffusion models. Our solver estimates the first three-order moments and optimizes the parameters of a Gaussian mixture transition kernel using generalized methods of moments in each step during sampling. Empirically, our solver outperforms numerous SDE-based solvers in terms of sample quality in image generation and stroke-based synthesis in various diffusion models, which validates the motivation and effectiveness of GMS. Our code is available at https://github.com/Guohanzhong/GMS.

* NeurIPS 2023

Via

Access Paper or Ask Questions

The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing

Nov 02, 2023

Shen Nie, Hanzhong Allan Guo, Cheng Lu, Yuhao Zhou, Chenyu Zheng, Chongxuan Li

Figure 1 for The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing

Figure 2 for The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing

Figure 3 for The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing

Figure 4 for The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing

Abstract:We present a unified probabilistic formulation for diffusion-based image editing, where a latent variable is edited in a task-specific manner and generally deviates from the corresponding marginal distribution induced by the original stochastic or ordinary differential equation (SDE or ODE). Instead, it defines a corresponding SDE or ODE for editing. In the formulation, we prove that the Kullback-Leibler divergence between the marginal distributions of the two SDEs gradually decreases while that for the ODEs remains as the time approaches zero, which shows the promise of SDE in image editing. Inspired by it, we provide the SDE counterparts for widely used ODE baselines in various tasks including inpainting and image-to-image translation, where SDE shows a consistent and substantial improvement. Moreover, we propose SDE-Drag -- a simple yet effective method built upon the SDE formulation for point-based content dragging. We build a challenging benchmark (termed DragBench) with open-set natural, art, and AI-generated images for evaluation. A user study on DragBench indicates that SDE-Drag significantly outperforms our ODE baseline, existing diffusion-based methods, and the renowned DragGAN. Our results demonstrate the superiority and versatility of SDE in image editing and push the boundary of diffusion-based editing methods.

Via

Access Paper or Ask Questions

BayesDiff: Estimating Pixel-wise Uncertainty in Diffusion via Bayesian Inference

Oct 17, 2023

Siqi Kou, Lei Gan, Dequan Wang, Chongxuan Li, Zhijie Deng

Figure 1 for BayesDiff: Estimating Pixel-wise Uncertainty in Diffusion via Bayesian Inference

Figure 2 for BayesDiff: Estimating Pixel-wise Uncertainty in Diffusion via Bayesian Inference

Figure 3 for BayesDiff: Estimating Pixel-wise Uncertainty in Diffusion via Bayesian Inference

Figure 4 for BayesDiff: Estimating Pixel-wise Uncertainty in Diffusion via Bayesian Inference

Abstract:Diffusion models have impressive image generation capability, but low-quality generations still exist, and their identification remains challenging due to the lack of a proper sample-wise metric. To address this, we propose BayesDiff, a pixel-wise uncertainty estimator for generations from diffusion models based on Bayesian inference. In particular, we derive a novel uncertainty iteration principle to characterize the uncertainty dynamics in diffusion, and leverage the last-layer Laplace approximation for efficient Bayesian inference. The estimated pixel-wise uncertainty can not only be aggregated into a sample-wise metric to filter out low-fidelity images but also aids in augmenting successful generations and rectifying artifacts in failed generations in text-to-image tasks. Extensive experiments demonstrate the efficacy of BayesDiff and its promise for practical applications.

Via

Access Paper or Ask Questions

On Memorization in Diffusion Models

Oct 04, 2023

Xiangming Gu, Chao Du, Tianyu Pang, Chongxuan Li, Min Lin, Ye Wang

Figure 1 for On Memorization in Diffusion Models

Figure 2 for On Memorization in Diffusion Models

Figure 3 for On Memorization in Diffusion Models

Figure 4 for On Memorization in Diffusion Models

Abstract:Due to their capacity to generate novel and high-quality samples, diffusion models have attracted significant research interest in recent years. Notably, the typical training objective of diffusion models, i.e., denoising score matching, has a closed-form optimal solution that can only generate training data replicating samples. This indicates that a memorization behavior is theoretically expected, which contradicts the common generalization ability of state-of-the-art diffusion models, and thus calls for a deeper understanding. Looking into this, we first observe that memorization behaviors tend to occur on smaller-sized datasets, which motivates our definition of effective model memorization (EMM), a metric measuring the maximum size of training data at which a learned diffusion model approximates its theoretical optimum. Then, we quantify the impact of the influential factors on these memorization behaviors in terms of EMM, focusing primarily on data distribution, model configuration, and training procedure. Besides comprehensive empirical results identifying the influential factors, we surprisingly find that conditioning training data on uninformative random labels can significantly trigger the memorization in diffusion models. Our study holds practical significance for diffusion model users and offers clues to theoretical research in deep generative models. Code is available at https://github.com/sail-sg/DiffMemorize.

Via

Access Paper or Ask Questions

Inversion-by-Inversion: Exemplar-based Sketch-to-Photo Synthesis via Stochastic Differential Equations without Training

Aug 15, 2023

Ximing Xing, Chuang Wang, Haitao Zhou, Zhihao Hu, Chongxuan Li, Dong Xu, Qian Yu

Figure 1 for Inversion-by-Inversion: Exemplar-based Sketch-to-Photo Synthesis via Stochastic Differential Equations without Training

Figure 2 for Inversion-by-Inversion: Exemplar-based Sketch-to-Photo Synthesis via Stochastic Differential Equations without Training

Figure 3 for Inversion-by-Inversion: Exemplar-based Sketch-to-Photo Synthesis via Stochastic Differential Equations without Training

Figure 4 for Inversion-by-Inversion: Exemplar-based Sketch-to-Photo Synthesis via Stochastic Differential Equations without Training

Abstract:Exemplar-based sketch-to-photo synthesis allows users to generate photo-realistic images based on sketches. Recently, diffusion-based methods have achieved impressive performance on image generation tasks, enabling highly-flexible control through text-driven generation or energy functions. However, generating photo-realistic images with color and texture from sketch images remains challenging for diffusion models. Sketches typically consist of only a few strokes, with most regions left blank, making it difficult for diffusion-based methods to produce photo-realistic images. In this work, we propose a two-stage method named ``Inversion-by-Inversion" for exemplar-based sketch-to-photo synthesis. This approach includes shape-enhancing inversion and full-control inversion. During the shape-enhancing inversion process, an uncolored photo is generated with the guidance of a shape-energy function. This step is essential to ensure control over the shape of the generated photo. In the full-control inversion process, we propose an appearance-energy function to control the color and texture of the final generated photo.Importantly, our Inversion-by-Inversion pipeline is training-free and can accept different types of exemplars for color and texture control. We conducted extensive experiments to evaluate our proposed method, and the results demonstrate its effectiveness.

* 15 pages, preprint version

Via

Access Paper or Ask Questions

MissDiff: Training Diffusion Models on Tabular Data with Missing Values

Jul 02, 2023

Yidong Ouyang, Liyan Xie, Chongxuan Li, Guang Cheng

Abstract:The diffusion model has shown remarkable performance in modeling data distributions and synthesizing data. However, the vanilla diffusion model requires complete or fully observed data for training. Incomplete data is a common issue in various real-world applications, including healthcare and finance, particularly when dealing with tabular datasets. This work presents a unified and principled diffusion-based framework for learning from data with missing values under various missing mechanisms. We first observe that the widely adopted "impute-then-generate" pipeline may lead to a biased learning objective. Then we propose to mask the regression loss of Denoising Score Matching in the training phase. We prove the proposed method is consistent in learning the score of data distributions, and the proposed training objective serves as an upper bound for the negative likelihood in certain cases. The proposed framework is evaluated on multiple tabular datasets using realistic and efficacious metrics and is demonstrated to outperform state-of-the-art diffusion model on tabular data with "impute-then-generate" pipeline by a large margin.

* 22 pages, short version is accepted by ICML workshop on Structured Probabilistic Inference & Generative Modeling 2023

Via

Access Paper or Ask Questions