Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Krishna Kumar Singh

GIRAFFE HD: A High-Resolution 3D-aware Generative Model

Mar 28, 2022

Yang Xue, Yuheng Li, Krishna Kumar Singh, Yong Jae Lee

Figure 1 for GIRAFFE HD: A High-Resolution 3D-aware Generative Model

Figure 2 for GIRAFFE HD: A High-Resolution 3D-aware Generative Model

Figure 3 for GIRAFFE HD: A High-Resolution 3D-aware Generative Model

Figure 4 for GIRAFFE HD: A High-Resolution 3D-aware Generative Model

Abstract:3D-aware generative models have shown that the introduction of 3D information can lead to more controllable image generation. In particular, the current state-of-the-art model GIRAFFE can control each object's rotation, translation, scale, and scene camera pose without corresponding supervision. However, GIRAFFE only operates well when the image resolution is low. We propose GIRAFFE HD, a high-resolution 3D-aware generative model that inherits all of GIRAFFE's controllable features while generating high-quality, high-resolution images ($512^2$ resolution and above). The key idea is to leverage a style-based neural renderer, and to independently generate the foreground and background to force their disentanglement while imposing consistency constraints to stitch them together to composite a coherent final image. We demonstrate state-of-the-art 3D controllable high-resolution image generation on multiple natural image datasets.

* CVPR 2022

Via

Access Paper or Ask Questions

InsetGAN for Full-Body Image Generation

Mar 14, 2022

Anna Frühstück, Krishna Kumar Singh, Eli Shechtman, Niloy J. Mitra, Peter Wonka, Jingwan Lu

Figure 1 for InsetGAN for Full-Body Image Generation

Figure 2 for InsetGAN for Full-Body Image Generation

Figure 3 for InsetGAN for Full-Body Image Generation

Figure 4 for InsetGAN for Full-Body Image Generation

Abstract:While GANs can produce photo-realistic images in ideal conditions for certain domains, the generation of full-body human images remains difficult due to the diversity of identities, hairstyles, clothing, and the variance in pose. Instead of modeling this complex domain with a single GAN, we propose a novel method to combine multiple pretrained GANs, where one GAN generates a global canvas (e.g., human body) and a set of specialized GANs, or insets, focus on different parts (e.g., faces, shoes) that can be seamlessly inserted onto the global canvas. We model the problem as jointly exploring the respective latent spaces such that the generated images can be combined, by inserting the parts from the specialized generators onto the global canvas, without introducing seams. We demonstrate the setup by combining a full body GAN with a dedicated high-quality face GAN to produce plausible-looking humans. We evaluate our results with quantitative metrics and user studies.

* Project webpage and video available at http://afruehstueck.github.io/insetgan

Via

Access Paper or Ask Questions

Dance In the Wild: Monocular Human Animation with Neural Dynamic Appearance Synthesis

Nov 10, 2021

Tuanfeng Y. Wang, Duygu Ceylan, Krishna Kumar Singh, Niloy J. Mitra

Figure 1 for Dance In the Wild: Monocular Human Animation with Neural Dynamic Appearance Synthesis

Figure 2 for Dance In the Wild: Monocular Human Animation with Neural Dynamic Appearance Synthesis

Figure 3 for Dance In the Wild: Monocular Human Animation with Neural Dynamic Appearance Synthesis

Figure 4 for Dance In the Wild: Monocular Human Animation with Neural Dynamic Appearance Synthesis

Abstract:Synthesizing dynamic appearances of humans in motion plays a central role in applications such as AR/VR and video editing. While many recent methods have been proposed to tackle this problem, handling loose garments with complex textures and high dynamic motion still remains challenging. In this paper, we propose a video based appearance synthesis method that tackles such challenges and demonstrates high quality results for in-the-wild videos that have not been shown before. Specifically, we adopt a StyleGAN based architecture to the task of person specific video based motion retargeting. We introduce a novel motion signature that is used to modulate the generator weights to capture dynamic appearance changes as well as regularizing the single frame based pose estimates to improve temporal coherency. We evaluate our method on a set of challenging videos and show that our approach achieves state-of-the art performance both qualitatively and quantitatively.

Via

Access Paper or Ask Questions

Collaging Class-specific GANs for Semantic Image Synthesis

Oct 08, 2021

Yuheng Li, Yijun Li, Jingwan Lu, Eli Shechtman, Yong Jae Lee, Krishna Kumar Singh

Figure 1 for Collaging Class-specific GANs for Semantic Image Synthesis

Figure 2 for Collaging Class-specific GANs for Semantic Image Synthesis

Figure 3 for Collaging Class-specific GANs for Semantic Image Synthesis

Figure 4 for Collaging Class-specific GANs for Semantic Image Synthesis

Abstract:We propose a new approach for high resolution semantic image synthesis. It consists of one base image generator and multiple class-specific generators. The base generator generates high quality images based on a segmentation map. To further improve the quality of different objects, we create a bank of Generative Adversarial Networks (GANs) by separately training class-specific models. This has several benefits including -- dedicated weights for each class; centrally aligned data for each model; additional training data from other sources, potential of higher resolution and quality; and easy manipulation of a specific object in the scene. Experiments show that our approach can generate high quality images in high resolution while having flexibility of object-level control by using class-specific generators.

* ICCV 2021

Via

Access Paper or Ask Questions

IMAGINE: Image Synthesis by Image-Guided Model Inversion

Apr 13, 2021

Pei Wang, Yijun Li, Krishna Kumar Singh, Jingwan Lu, Nuno Vasconcelos

Figure 1 for IMAGINE: Image Synthesis by Image-Guided Model Inversion

Figure 2 for IMAGINE: Image Synthesis by Image-Guided Model Inversion

Figure 3 for IMAGINE: Image Synthesis by Image-Guided Model Inversion

Figure 4 for IMAGINE: Image Synthesis by Image-Guided Model Inversion

Abstract:We introduce an inversion based method, denoted as IMAge-Guided model INvErsion (IMAGINE), to generate high-quality and diverse images from only a single training sample. We leverage the knowledge of image semantics from a pre-trained classifier to achieve plausible generations via matching multi-level feature representations in the classifier, associated with adversarial training with an external discriminator. IMAGINE enables the synthesis procedure to simultaneously 1) enforce semantic specificity constraints during the synthesis, 2) produce realistic images without generator training, and 3) give users intuitive control over the generation process. With extensive experimental results, we demonstrate qualitatively and quantitatively that IMAGINE performs favorably against state-of-the-art GAN-based and inversion-based methods, across three different image domains (i.e., objects, scenes, and textures).

* Published in CVPR2021

Via

Access Paper or Ask Questions

Generating Furry Cars: Disentangling Object Shape & Appearance across Multiple Domains

Apr 05, 2021

Utkarsh Ojha, Krishna Kumar Singh, Yong Jae Lee

Figure 1 for Generating Furry Cars: Disentangling Object Shape & Appearance across Multiple Domains

Figure 2 for Generating Furry Cars: Disentangling Object Shape & Appearance across Multiple Domains

Figure 3 for Generating Furry Cars: Disentangling Object Shape & Appearance across Multiple Domains

Figure 4 for Generating Furry Cars: Disentangling Object Shape & Appearance across Multiple Domains

Abstract:We consider the novel task of learning disentangled representations of object shape and appearance across multiple domains (e.g., dogs and cars). The goal is to learn a generative model that learns an intermediate distribution, which borrows a subset of properties from each domain, enabling the generation of images that did not exist in any domain exclusively. This challenging problem requires an accurate disentanglement of object shape, appearance, and background from each domain, so that the appearance and shape factors from the two domains can be interchanged. We augment an existing approach that can disentangle factors within a single domain but struggles to do so across domains. Our key technical contribution is to represent object appearance with a differentiable histogram of visual features, and to optimize the generator so that two images with the same latent appearance factor but different latent shape factors produce similar histograms. On multiple multi-domain datasets, we demonstrate our method leads to accurate and consistent appearance and shape transfer across domains.

* Camera ready version for ICLR 2021

Via

Access Paper or Ask Questions

Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias

Jan 09, 2020

Krishna Kumar Singh, Dhruv Mahajan, Kristen Grauman, Yong Jae Lee, Matt Feiszli, Deepti Ghadiyaram

Figure 1 for Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias

Figure 2 for Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias

Figure 3 for Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias

Figure 4 for Don't Judge an Object by Its Context: Learning to Overcome Contextual Bias

Abstract:Existing models often leverage co-occurrences between objects and their context to improve recognition accuracy. However, strongly relying on context risks a model's generalizability, especially when typical co-occurrence patterns are absent. This work focuses on addressing such contextual biases to improve the robustness of the learnt feature representations. Our goal is to accurately recognize a category in the absence of its context, without compromising on performance when it co-occurs with context. Our key idea is to decorrelate feature representations of a category from its co-occurring context. We achieve this by learning a feature subspace that explicitly represents categories occurring in the absence of context along side a joint feature subspace that represents both categories and context. Our very simple yet effective method is extensible to two multi-label tasks -- object and attribute classification. On 4 challenging datasets, we demonstrate the effectiveness of our method in reducing contextual bias.

Via

Access Paper or Ask Questions

MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation

Nov 27, 2019

Yuheng Li, Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Lee

Figure 1 for MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation

Figure 2 for MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation

Figure 3 for MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation

Figure 4 for MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation

Abstract:We present MixNMatch, a conditional generative model that learns to disentangle and encode background, object pose, shape, and texture from real images with minimal supervision, for mix-and-match image generation. We build upon FineGAN, an unconditional generative model, to learn the desired disentanglement and image generator, and leverage adversarial joint image-code distribution matching to learn the latent factor encoders. MixNMatch requires bounding boxes during training to model background, but requires no other supervision. Through extensive experiments, we demonstrate MixNMatch's ability to accurately disentangle, encode, and combine multiple factors for mix-and-match image generation, including sketch2color, cartoon2img, and img2gif applications. Our code/models/demo can be found at https://github.com/Yuheng-Li/MixNMatch

Via

Access Paper or Ask Questions

Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Imbalanced Data

Oct 01, 2019

Utkarsh Ojha, Krishna Kumar Singh, Cho-Jui Hsieh, Yong Jae Lee

Figure 1 for Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Imbalanced Data

Figure 2 for Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Imbalanced Data

Figure 3 for Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Imbalanced Data

Figure 4 for Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Imbalanced Data

Abstract:We propose a novel unsupervised generative model, Elastic-InfoGAN, that learns to disentangle object identity from other low-level aspects in class-imbalanced datasets. We first investigate the issues surrounding the assumptions about uniformity made by InfoGAN, and demonstrate its ineffectiveness to properly disentangle object identity in imbalanced data. Our key idea is to make the discovery of the discrete latent factor of variation invariant to identity-preserving transformations in real images, and use that as the signal to learn the latent distribution's parameters. Experiments on both artificial (MNIST) and real-world (YouTube-Faces) datasets demonstrate the effectiveness of our approach in imbalanced data by: (i) better disentanglement of object identity as a latent factor of variation; and (ii) better approximation of class imbalance in the data, as reflected in the learned parameters of the latent distribution.

Via

Access Paper or Ask Questions

FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery

Nov 27, 2018

Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Lee

Figure 1 for FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery

Figure 2 for FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery

Figure 3 for FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery

Figure 4 for FineGAN: Unsupervised Hierarchical Disentanglement for Fine-Grained Object Generation and Discovery

Abstract:We propose FineGAN, a novel unsupervised GAN framework, which disentangles the background, object shape, and object appearance to hierarchically generate images of fine-grained object categories. To disentangle the factors without any supervision, our key idea is to use information theory to associate each factor to a latent code, and to condition the relationships between the codes in a specific way to induce the desired hierarchy. Through extensive experiments, we show that FineGAN achieves the desired disentanglement to generate realistic and diverse images belonging to fine-grained classes of birds, dogs, and cars. Using FineGAN's automatically learned features, we also cluster real images as a first attempt at solving the novel problem of unsupervised fine-grained object category discovery. Our video demo can be found at https://www.youtube.com/watch?v=tkk0SeWGu-8.

* Technical report

Via

Access Paper or Ask Questions