Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yijun Li

Zero-shot Image-to-Image Translation

Feb 06, 2023

Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, Yijun Li, Jingwan Lu, Jun-Yan Zhu

Figure 1 for Zero-shot Image-to-Image Translation

Figure 2 for Zero-shot Image-to-Image Translation

Figure 3 for Zero-shot Image-to-Image Translation

Figure 4 for Zero-shot Image-to-Image Translation

Abstract:Large-scale text-to-image generative models have shown their remarkable ability to synthesize diverse and high-quality images. However, it is still challenging to directly apply these models for editing real images for two reasons. First, it is hard for users to come up with a perfect text prompt that accurately describes every visual detail in the input image. Second, while existing models can introduce desirable changes in certain regions, they often dramatically alter the input content and introduce unexpected changes in unwanted regions. In this work, we propose pix2pix-zero, an image-to-image translation method that can preserve the content of the original image without manual prompting. We first automatically discover editing directions that reflect desired edits in the text embedding space. To preserve the general content structure after editing, we further propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process. In addition, our method does not need additional training for these edits and can directly use the existing pre-trained text-to-image diffusion model. We conduct extensive experiments and show that our method outperforms existing and concurrent works for both real and synthetic image editing.

* website: https://pix2pixzero.github.io/

Via

Access Paper or Ask Questions

Contrastive Learning for Diverse Disentangled Foreground Generation

Nov 04, 2022

Yuheng Li, Yijun Li, Jingwan Lu, Eli Shechtman, Yong Jae Lee, Krishna Kumar Singh

Abstract:We introduce a new method for diverse foreground generation with explicit control over various factors. Existing image inpainting based foreground generation methods often struggle to generate diverse results and rarely allow users to explicitly control specific factors of variation (e.g., varying the facial identity or expression for face inpainting results). We leverage contrastive learning with latent codes to generate diverse foreground results for the same masked input. Specifically, we define two sets of latent codes, where one controls a pre-defined factor (``known''), and the other controls the remaining factors (``unknown''). The sampled latent codes from the two sets jointly bi-modulate the convolution kernels to guide the generator to synthesize diverse results. Experiments demonstrate the superiority of our method over state-of-the-arts in result diversity and generation controllability.

* ECCV 2022

Via

Access Paper or Ask Questions

3D-FM GAN: Towards 3D-Controllable Face Manipulation

Aug 24, 2022

Yuchen Liu, Zhixin Shu, Yijun Li, Zhe Lin, Richard Zhang, S. Y. Kung

Figure 1 for 3D-FM GAN: Towards 3D-Controllable Face Manipulation

Figure 2 for 3D-FM GAN: Towards 3D-Controllable Face Manipulation

Figure 3 for 3D-FM GAN: Towards 3D-Controllable Face Manipulation

Figure 4 for 3D-FM GAN: Towards 3D-Controllable Face Manipulation

Abstract:3D-controllable portrait synthesis has significantly advanced, thanks to breakthroughs in generative adversarial networks (GANs). However, it is still challenging to manipulate existing face images with precise 3D control. While concatenating GAN inversion and a 3D-aware, noise-to-image GAN is a straight-forward solution, it is inefficient and may lead to noticeable drop in editing quality. To fill this gap, we propose 3D-FM GAN, a novel conditional GAN framework designed specifically for 3D-controllable face manipulation, and does not require any tuning after the end-to-end learning phase. By carefully encoding both the input face image and a physically-based rendering of 3D edits into a StyleGAN's latent spaces, our image generator provides high-quality, identity-preserved, 3D-controllable face manipulation. To effectively learn such novel framework, we develop two essential training strategies and a novel multiplicative co-modulation architecture that improves significantly upon naive schemes. With extensive evaluations, we show that our method outperforms the prior arts on various tasks, with better editability, stronger identity preservation, and higher photo-realism. In addition, we demonstrate a better generalizability of our design on large pose editing and out-of-domain images.

* Accepted to ECCV2022. Project webpage: https://lychenyoko.github.io/3D-FM-GAN-Webpage/

Via

Access Paper or Ask Questions

Spatially-Adaptive Multilayer Selection for GAN Inversion and Editing

Jun 16, 2022

Gaurav Parmar, Yijun Li, Jingwan Lu, Richard Zhang, Jun-Yan Zhu, Krishna Kumar Singh

Figure 1 for Spatially-Adaptive Multilayer Selection for GAN Inversion and Editing

Figure 2 for Spatially-Adaptive Multilayer Selection for GAN Inversion and Editing

Figure 3 for Spatially-Adaptive Multilayer Selection for GAN Inversion and Editing

Figure 4 for Spatially-Adaptive Multilayer Selection for GAN Inversion and Editing

Abstract:Existing GAN inversion and editing methods work well for aligned objects with a clean background, such as portraits and animal faces, but often struggle for more difficult categories with complex scene layouts and object occlusions, such as cars, animals, and outdoor images. We propose a new method to invert and edit such complex images in the latent space of GANs, such as StyleGAN2. Our key idea is to explore inversion with a collection of layers, spatially adapting the inversion process to the difficulty of the image. We learn to predict the "invertibility" of different image segments and project each segment into a latent layer. Easier regions can be inverted into an earlier layer in the generator's latent space, while more challenging regions can be inverted into a later feature space. Experiments show that our method obtains better inversion results compared to the recent approaches on complex categories, while maintaining downstream editability. Please refer to our project page at https://www.cs.cmu.edu/~SAMInversion.

* CVPR 2022. Github: https://github.com/adobe-research/sam_inversion Website: https://www.cs.cmu.edu/~SAMInversion

Via

Access Paper or Ask Questions

Emerging Artificial Intelligence Applications in Spatial Transcriptomics Analysis

Mar 18, 2022

Yijun Li, Stefan Stanojevic, Lana X. Garmire

Figure 1 for Emerging Artificial Intelligence Applications in Spatial Transcriptomics Analysis

Abstract:Spatial transcriptomics (ST) has advanced significantly in the last few years. Such advancement comes with the urgent need for novel computational methods to handle the unique challenges of ST data analysis. Many artificial intelligence (AI) methods have been developed to utilize various machine learning and deep learning techniques for computational ST analysis. This review provides a comprehensive and up-to-date survey of current AI methods for ST analysis.

Via

Access Paper or Ask Questions

Collaging Class-specific GANs for Semantic Image Synthesis

Oct 08, 2021

Yuheng Li, Yijun Li, Jingwan Lu, Eli Shechtman, Yong Jae Lee, Krishna Kumar Singh

Figure 1 for Collaging Class-specific GANs for Semantic Image Synthesis

Figure 2 for Collaging Class-specific GANs for Semantic Image Synthesis

Figure 3 for Collaging Class-specific GANs for Semantic Image Synthesis

Figure 4 for Collaging Class-specific GANs for Semantic Image Synthesis

Abstract:We propose a new approach for high resolution semantic image synthesis. It consists of one base image generator and multiple class-specific generators. The base generator generates high quality images based on a segmentation map. To further improve the quality of different objects, we create a bank of Generative Adversarial Networks (GANs) by separately training class-specific models. This has several benefits including -- dedicated weights for each class; centrally aligned data for each model; additional training data from other sources, potential of higher resolution and quality; and easy manipulation of a specific object in the scene. Experiments show that our approach can generate high quality images in high resolution while having flexibility of object-level control by using class-specific generators.

* ICCV 2021

Via

Access Paper or Ask Questions

Few-shot Image Generation via Cross-domain Correspondence

Apr 13, 2021

Utkarsh Ojha, Yijun Li, Jingwan Lu, Alexei A. Efros, Yong Jae Lee, Eli Shechtman, Richard Zhang

Figure 1 for Few-shot Image Generation via Cross-domain Correspondence

Figure 2 for Few-shot Image Generation via Cross-domain Correspondence

Figure 3 for Few-shot Image Generation via Cross-domain Correspondence

Figure 4 for Few-shot Image Generation via Cross-domain Correspondence

Abstract:Training generative models, such as GANs, on a target domain containing limited examples (e.g., 10) can easily result in overfitting. In this work, we seek to utilize a large source domain for pretraining and transfer the diversity information from source to target. We propose to preserve the relative similarities and differences between instances in the source via a novel cross-domain distance consistency loss. To further reduce overfitting, we present an anchor-based strategy to encourage different levels of realism over different regions in the latent space. With extensive results in both photorealistic and non-photorealistic domains, we demonstrate qualitatively and quantitatively that our few-shot model automatically discovers correspondences between source and target domains and generates more diverse and realistic images than previous methods.

* CVPR 2021

Via

Access Paper or Ask Questions

IMAGINE: Image Synthesis by Image-Guided Model Inversion

Apr 13, 2021

Pei Wang, Yijun Li, Krishna Kumar Singh, Jingwan Lu, Nuno Vasconcelos

Figure 1 for IMAGINE: Image Synthesis by Image-Guided Model Inversion

Figure 2 for IMAGINE: Image Synthesis by Image-Guided Model Inversion

Figure 3 for IMAGINE: Image Synthesis by Image-Guided Model Inversion

Figure 4 for IMAGINE: Image Synthesis by Image-Guided Model Inversion

Abstract:We introduce an inversion based method, denoted as IMAge-Guided model INvErsion (IMAGINE), to generate high-quality and diverse images from only a single training sample. We leverage the knowledge of image semantics from a pre-trained classifier to achieve plausible generations via matching multi-level feature representations in the classifier, associated with adversarial training with an external discriminator. IMAGINE enables the synthesis procedure to simultaneously 1) enforce semantic specificity constraints during the synthesis, 2) produce realistic images without generator training, and 3) give users intuitive control over the generation process. With extensive experimental results, we demonstrate qualitatively and quantitatively that IMAGINE performs favorably against state-of-the-art GAN-based and inversion-based methods, across three different image domains (i.e., objects, scenes, and textures).

* Published in CVPR2021

Via

Access Paper or Ask Questions

Rethinking and Improving the Robustness of Image Style Transfer

Apr 08, 2021

Pei Wang, Yijun Li, Nuno Vasconcelos

Figure 1 for Rethinking and Improving the Robustness of Image Style Transfer

Figure 2 for Rethinking and Improving the Robustness of Image Style Transfer

Figure 3 for Rethinking and Improving the Robustness of Image Style Transfer

Figure 4 for Rethinking and Improving the Robustness of Image Style Transfer

Abstract:Extensive research in neural style transfer methods has shown that the correlation between features extracted by a pre-trained VGG network has a remarkable ability to capture the visual style of an image. Surprisingly, however, this stylization quality is not robust and often degrades significantly when applied to features from more advanced and lightweight networks, such as those in the ResNet family. By performing extensive experiments with different network architectures, we find that residual connections, which represent the main architectural difference between VGG and ResNet, produce feature maps of small entropy, which are not suitable for style transfer. To improve the robustness of the ResNet architecture, we then propose a simple yet effective solution based on a softmax transformation of the feature activations that enhances their entropy. Experimental results demonstrate that this small magic can greatly improve the quality of stylization results, even for networks with random weights. This suggests that the architecture used for feature extraction is more important than the use of learned weights for the task of style transfer.

* Published in CVPR2021 (Oral)

Via

Access Paper or Ask Questions

Content-Aware GAN Compression

Apr 06, 2021

Yuchen Liu, Zhixin Shu, Yijun Li, Zhe Lin, Federico Perazzi, S. Y. Kung

Figure 1 for Content-Aware GAN Compression

Figure 2 for Content-Aware GAN Compression

Figure 3 for Content-Aware GAN Compression

Figure 4 for Content-Aware GAN Compression

Abstract:Generative adversarial networks (GANs), e.g., StyleGAN2, play a vital role in various image generation and synthesis tasks, yet their notoriously high computational cost hinders their efficient deployment on edge devices. Directly applying generic compression approaches yields poor results on GANs, which motivates a number of recent GAN compression works. While prior works mainly accelerate conditional GANs, e.g., pix2pix and CycleGAN, compressing state-of-the-art unconditional GANs has rarely been explored and is more challenging. In this paper, we propose novel approaches for unconditional GAN compression. We first introduce effective channel pruning and knowledge distillation schemes specialized for unconditional GANs. We then propose a novel content-aware method to guide the processes of both pruning and distillation. With content-awareness, we can effectively prune channels that are unimportant to the contents of interest, e.g., human faces, and focus our distillation on these regions, which significantly enhances the distillation quality. On StyleGAN2 and SN-GAN, we achieve a substantial improvement over the state-of-the-art compression method. Notably, we reduce the FLOPs of StyleGAN2 by 11x with visually negligible image quality loss compared to the full-size model. More interestingly, when applied to various image manipulation tasks, our compressed model forms a smoother and better disentangled latent manifold, making it more effective for image editing.

* Published in CVPR2021

Via

Access Paper or Ask Questions