Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jong Chul Ye

KAIST Graduate School of AI

Parallel Diffusion Models of Operator and Image for Blind Inverse Problems

Nov 19, 2022

Hyungjin Chung, Jeongsol Kim, Sehui Kim, Jong Chul Ye

Abstract:Diffusion model-based inverse problem solvers have demonstrated state-of-the-art performance in cases where the forward operator is known (i.e. non-blind). However, the applicability of the method to blind inverse problems has yet to be explored. In this work, we show that we can indeed solve a family of blind inverse problems by constructing another diffusion prior for the forward operator. Specifically, parallel reverse diffusion guided by gradients from the intermediate stages enables joint optimization of both the forward operator parameters as well as the image, such that both are jointly estimated at the end of the parallel reverse diffusion procedure. We show the efficacy of our method on two representative tasks -- blind deblurring, and imaging through turbulence -- and show that our method yields state-of-the-art performance, while also being flexible to be applicable to general blind inverse problems when we know the functional forms.

* 25 pages, 13 figures

Via

Access Paper or Ask Questions

Magnitude and Angle Dynamics in Training Single ReLU Neurons

Oct 12, 2022

Sangmin Lee, Byeongsu Sim, Jong Chul Ye

Figure 1 for Magnitude and Angle Dynamics in Training Single ReLU Neurons

Figure 2 for Magnitude and Angle Dynamics in Training Single ReLU Neurons

Figure 3 for Magnitude and Angle Dynamics in Training Single ReLU Neurons

Figure 4 for Magnitude and Angle Dynamics in Training Single ReLU Neurons

Abstract:To understand learning the dynamics of deep ReLU networks, we investigate the dynamic system of gradient flow $w(t)$ by decomposing it to magnitude $w(t)$ and angle $\phi(t):= \pi - \theta(t) $ components. In particular, for multi-layer single ReLU neurons with spherically symmetric data distribution and the square loss function, we provide upper and lower bounds for magnitude and angle components to describe the dynamics of gradient flow. Using the obtained bounds, we conclude that small scale initialization induces slow convergence speed for deep single ReLU neurons. Finally, by exploiting the relation of gradient flow and gradient descent, we extend our results to the gradient descent approach. All theoretical results are verified by experiments.

Via

Access Paper or Ask Questions

Self-supervised debiasing using low rank regularization

Oct 11, 2022

Geon Yeong Park, Chanyong Jung, Jong Chul Ye, Sang Wan Lee

Figure 1 for Self-supervised debiasing using low rank regularization

Figure 2 for Self-supervised debiasing using low rank regularization

Figure 3 for Self-supervised debiasing using low rank regularization

Figure 4 for Self-supervised debiasing using low rank regularization

Abstract:Spurious correlations can cause strong biases in deep neural networks, impairing generalization ability. While most of existing debiasing methods require full supervisions on either spurious attributes or target labels, training a debiased model from a limited amount of both annotations is still an open issue. To overcome such limitations, we first examined an interesting phenomenon by the spectral analysis of latent representations: spuriously correlated, easy-to-learn attributes make neural networks inductively biased towards encoding lower effective rank representations. We also show that a rank regularization can amplify this bias in a way that encourages highly correlated features. Motivated by these observations, we propose a self-supervised debiasing framework that is potentially compatible with unlabeled samples. We first pretrain a biased encoder in a self-supervised manner with the rank regularization, serving as a semantic bottleneck to enforce the encoder to learn the spuriously correlated attributes. This biased encoder is then used to discover and upweight bias-conflicting samples in a downstream task, serving as a boosting to effectively debias the main model. Remarkably, the proposed debiasing framework significantly improves the generalization performance of self-supervised learning baselines and, in some cases, even outperforms state-of-the-art supervised debiasing approaches.

Via

Access Paper or Ask Questions

Efficient debiasing with contrastive weight pruning

Oct 11, 2022

Geon Yeong Park, Sangmin Lee, Sang Wan Lee, Jong Chul Ye

Figure 1 for Efficient debiasing with contrastive weight pruning

Figure 2 for Efficient debiasing with contrastive weight pruning

Figure 3 for Efficient debiasing with contrastive weight pruning

Figure 4 for Efficient debiasing with contrastive weight pruning

Abstract:Neural networks are often biased to spuriously correlated features that provide misleading statistical evidence that does not generalize. This raises a fundamental question: "Does an optimal unbiased functional subnetwork exist in a severely biased network? If so, how to extract such subnetwork?" While few studies have revealed the existence of such optimal subnetworks with the guidance of ground-truth unbiased samples, the way to discover the optimal subnetworks with biased training dataset is still unexplored in practice. To address this, here we first present our theoretical insight that alerts potential limitations of existing algorithms in exploring unbiased subnetworks in the presence of strong spurious correlations. We then further elucidate the importance of bias-conflicting samples on structure learning. Motivated by these observations, we propose a Debiased Contrastive Weight Pruning (DCWP) algorithm, which probes unbiased subnetworks without expensive group annotations. Experimental results demonstrate that our approach significantly outperforms state-of-the-art debiasing methods despite its considerable reduction in the number of parameters.

Via

Access Paper or Ask Questions

Diffusion-based Image Translation using Disentangled Style and Content Representation

Sep 30, 2022

Gihyun Kwon, Jong Chul Ye

Figure 1 for Diffusion-based Image Translation using Disentangled Style and Content Representation

Figure 2 for Diffusion-based Image Translation using Disentangled Style and Content Representation

Figure 3 for Diffusion-based Image Translation using Disentangled Style and Content Representation

Figure 4 for Diffusion-based Image Translation using Disentangled Style and Content Representation

Abstract:Diffusion-based image translation guided by semantic texts or a single target image has enabled flexible style transfer which is not limited to the specific domains. Unfortunately, due to the stochastic nature of diffusion models, it is often difficult to maintain the original content of the image during the reverse diffusion. To address this, here we present a novel diffusion-based unsupervised image translation method using disentangled style and content representation. Specifically, inspired by the splicing Vision Transformer, we extract intermediate keys of multihead self attention layer from ViT model and used them as the content preservation loss. Then, an image guided style transfer is performed by matching the [CLS] classification token from the denoised samples and target image, whereas additional CLIP loss is used for the text-driven style transfer. To further accelerate the semantic change during the reverse diffusion, we also propose a novel semantic divergence loss and resampling strategy. Our experimental results show that the proposed method outperforms state-of-the-art baseline models in both text-guided and image-guided translation tasks.

Via

Access Paper or Ask Questions

Diffusion Posterior Sampling for General Noisy Inverse Problems

Sep 29, 2022

Hyungjin Chung, Jeongsol Kim, Michael T. Mccann, Marc L. Klasky, Jong Chul Ye

Figure 1 for Diffusion Posterior Sampling for General Noisy Inverse Problems

Figure 2 for Diffusion Posterior Sampling for General Noisy Inverse Problems

Figure 3 for Diffusion Posterior Sampling for General Noisy Inverse Problems

Figure 4 for Diffusion Posterior Sampling for General Noisy Inverse Problems

Abstract:Diffusion models have been recently studied as powerful generative inverse problem solvers, owing to their high quality reconstructions and the ease of combining existing iterative solvers. However, most works focus on solving simple linear inverse problems in noiseless settings, which significantly under-represents the complexity of real-world problems. In this work, we extend diffusion solvers to efficiently handle general noisy (non)linear inverse problems via the Laplace approximation of the posterior sampling. Interestingly, the resulting posterior sampling scheme is a blended version of diffusion sampling with the manifold constrained gradient without a strict measurement consistency projection step, yielding a more desirable generative path in noisy settings compared to the previous studies. Our method demonstrates that diffusion models can incorporate various measurement noise statistics such as Gaussian and Poisson, and also efficiently handle noisy nonlinear inverse problems such as Fourier phase retrieval and non-uniform deblurring.

* Code available at https://github.com/DPS2022/diffusion-posterior-sampling

Via

Access Paper or Ask Questions

Denoising MCMC for Accelerating Diffusion-Based Generative Models

Sep 29, 2022

Beomsu Kim, Jong Chul Ye

Figure 1 for Denoising MCMC for Accelerating Diffusion-Based Generative Models

Figure 2 for Denoising MCMC for Accelerating Diffusion-Based Generative Models

Figure 3 for Denoising MCMC for Accelerating Diffusion-Based Generative Models

Figure 4 for Denoising MCMC for Accelerating Diffusion-Based Generative Models

Abstract:Diffusion models are powerful generative models that simulate the reverse of diffusion processes using score functions to synthesize data from noise. The sampling process of diffusion models can be interpreted as solving the reverse stochastic differential equation (SDE) or the ordinary differential equation (ODE) of the diffusion process, which often requires up to thousands of discretization steps to generate a single image. This has sparked a great interest in developing efficient integration techniques for reverse-S/ODEs. Here, we propose an orthogonal approach to accelerating score-based sampling: Denoising MCMC (DMCMC). DMCMC first uses MCMC to produce samples in the product space of data and variance (or diffusion time). Then, a reverse-S/ODE integrator is used to denoise the MCMC samples. Since MCMC traverses close to the data manifold, the computation cost of producing a clean sample for DMCMC is much less than that of producing a clean sample from noise. To verify the proposed concept, we show that Denoising Langevin Gibbs (DLG), an instance of DMCMC, successfully accelerates all six reverse-S/ODE integrators considered in this work on the tasks of CIFAR10 and CelebA-HQ-256 image generation. Notably, combined with integrators of Karras et al. (2022) and pre-trained score models of Song et al. (2021b), DLG achieves SOTA results. In the limited number of score function evaluation (NFE) settings on CIFAR10, we have $3.86$ FID with $\approx 10$ NFE and $2.63$ FID with $\approx 20$ NFE. On CelebA-HQ-256, we have $6.99$ FID with $\approx 160$ NFE, which beats the current best record of Kim et al. (2022) among score-based models, $7.16$ FID with $4000$ NFE. Code: https://github.com/1202kbs/DMCMC

Via

Access Paper or Ask Questions

Diffusion Adversarial Representation Learning for Self-supervised Vessel Segmentation

Sep 29, 2022

Boah Kim, Yujin Oh, Jong Chul Ye

Figure 1 for Diffusion Adversarial Representation Learning for Self-supervised Vessel Segmentation

Figure 2 for Diffusion Adversarial Representation Learning for Self-supervised Vessel Segmentation

Figure 3 for Diffusion Adversarial Representation Learning for Self-supervised Vessel Segmentation

Figure 4 for Diffusion Adversarial Representation Learning for Self-supervised Vessel Segmentation

Abstract:Vessel segmentation in medical images is one of the important tasks in the diagnosis of vascular diseases and therapy planning. Although learning-based segmentation approaches have been extensively studied, a large amount of ground-truth labels are required in supervised methods and confusing background structures make neural networks hard to segment vessels in an unsupervised manner. To address this, here we introduce a novel diffusion adversarial representation learning (DARL) model that leverages a denoising diffusion probabilistic model with adversarial learning, and apply it for vessel segmentation. In particular, for self-supervised vessel segmentation, DARL learns background image distribution using a diffusion module, which lets a generation module effectively provide vessel representations. Also, by adversarial learning based on the proposed switchable spatially-adaptive denormalization, our model estimates synthetic fake vessel images as well as vessel segmentation masks, which further makes the model capture vessel-relevant semantic information. Once the proposed model is trained, the model generates segmentation masks by one step and can be applied to general vascular structure segmentation of coronary angiography and retinal images. Experimental results on various datasets show that our method significantly outperforms existing unsupervised and self-supervised methods in vessel segmentation.

Via

Access Paper or Ask Questions

Alternating Cross-attention Vision-Language Model for Efficient Learning with Medical Image and Report without Curation

Aug 10, 2022

Sangjoon Park, Eun Sun Lee, Jeong Eun Lee, Jong Chul Ye

Figure 1 for Alternating Cross-attention Vision-Language Model for Efficient Learning with Medical Image and Report without Curation

Figure 2 for Alternating Cross-attention Vision-Language Model for Efficient Learning with Medical Image and Report without Curation

Figure 3 for Alternating Cross-attention Vision-Language Model for Efficient Learning with Medical Image and Report without Curation

Figure 4 for Alternating Cross-attention Vision-Language Model for Efficient Learning with Medical Image and Report without Curation

Abstract:Recent advances in vision-language pre-training have demonstrated astounding performances in diverse vision-language tasks, shedding a light on the long-standing problems of a comprehensive understanding of both visual and textual concepts in artificial intelligence research. However, there has been limited success in the application of vision-language pre-training in the medical domain, as the current vision-language models and learning strategies for photographic images and captions are not optimal to process the medical data which are usually insufficient in the amount and the diversity, which impedes successful learning of joint vision-language concepts. In this study, we introduce MAX-VL, a model tailored for efficient vision-language pre-training in the medical domain. We experimentally demonstrated that the pre-trained MAX-VL model outperforms the current state-of-the-art vision language models in various vision-language tasks. We also suggested the clinical utility for the diagnosis of newly emerging diseases and human error detection as well as showed the widespread applicability of the model in different domain data.

Via

Access Paper or Ask Questions

Pyramidal Denoising Diffusion Probabilistic Models

Aug 03, 2022

Dohoon Ryu, Jong Chul Ye

Figure 1 for Pyramidal Denoising Diffusion Probabilistic Models

Figure 2 for Pyramidal Denoising Diffusion Probabilistic Models

Figure 3 for Pyramidal Denoising Diffusion Probabilistic Models

Figure 4 for Pyramidal Denoising Diffusion Probabilistic Models

Abstract:Diffusion models have demonstrated impressive image generation performance, and have been used in various computer vision tasks. Unfortunately, image generation using diffusion models is very time-consuming since it requires thousands of sampling steps. To address this problem, here we present a novel pyramidal diffusion model to generate high resolution images starting from much coarser resolution images using a single score function trained with a positional embedding. This enables a time-efficient sampling for image generation, and also solves the low batch size problem when training with limited resources. Furthermore, we show that the proposed approach can be efficiently used for multi-scale super-resolution problem using a single score function.

Via

Access Paper or Ask Questions