Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Artem Babenko

Latent Transformations via NeuralODEs for GAN-based Image Editing

Nov 29, 2021

Valentin Khrulkov, Leyla Mirvakhabova, Ivan Oseledets, Artem Babenko

Figure 1 for Latent Transformations via NeuralODEs for GAN-based Image Editing

Figure 2 for Latent Transformations via NeuralODEs for GAN-based Image Editing

Figure 3 for Latent Transformations via NeuralODEs for GAN-based Image Editing

Figure 4 for Latent Transformations via NeuralODEs for GAN-based Image Editing

Abstract:Recent advances in high-fidelity semantic image editing heavily rely on the presumably disentangled latent spaces of the state-of-the-art generative models, such as StyleGAN. Specifically, recent works show that it is possible to achieve decent controllability of attributes in face images via linear shifts along with latent directions. Several recent methods address the discovery of such directions, implicitly assuming that the state-of-the-art GANs learn the latent spaces with inherently linearly separable attribute distributions and semantic vector arithmetic properties. In our work, we show that nonlinear latent code manipulations realized as flows of a trainable Neural ODE are beneficial for many practical non-face image domains with more complex non-textured factors of variation. In particular, we investigate a large number of datasets with known attributes and demonstrate that certain attribute manipulations are challenging to obtain with linear shifts only.

* Published at ICCV 2021

Via

Access Paper or Ask Questions

Distilling the Knowledge from Normalizing Flows

Jun 25, 2021

Dmitry Baranchuk, Vladimir Aliev, Artem Babenko

Figure 1 for Distilling the Knowledge from Normalizing Flows

Figure 2 for Distilling the Knowledge from Normalizing Flows

Figure 3 for Distilling the Knowledge from Normalizing Flows

Figure 4 for Distilling the Knowledge from Normalizing Flows

Abstract:Normalizing flows are a powerful class of generative models demonstrating strong performance in several speech and vision problems. In contrast to other generative models, normalizing flows are latent variable models with tractable likelihoods and allow for stable training. However, they have to be carefully designed to represent invertible functions with efficient Jacobian determinant calculation. In practice, these requirements lead to overparameterized and sophisticated architectures that are inferior to alternative feed-forward models in terms of inference time and memory consumption. In this work, we investigate whether one can distill flow-based models into more efficient alternatives. We provide a positive answer to this question by proposing a simple distillation approach and demonstrating its effectiveness on state-of-the-art conditional flow-based models for image super-resolution and speech synthesis.

* ICML Workshop: INNF+2021 (Spotlight)

Via

Access Paper or Ask Questions

Revisiting Deep Learning Models for Tabular Data

Jun 22, 2021

Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, Artem Babenko

Figure 1 for Revisiting Deep Learning Models for Tabular Data

Figure 2 for Revisiting Deep Learning Models for Tabular Data

Figure 3 for Revisiting Deep Learning Models for Tabular Data

Figure 4 for Revisiting Deep Learning Models for Tabular Data

Abstract:The necessity of deep learning for tabular data is still an unanswered question addressed by a large number of research efforts. The recent literature on tabular DL proposes several deep architectures reported to be superior to traditional "shallow" models like Gradient Boosted Decision Trees. However, since existing works often use different benchmarks and tuning protocols, it is unclear if the proposed models universally outperform GBDT. Moreover, the models are often not compared to each other, therefore, it is challenging to identify the best deep model for practitioners. In this work, we start from a thorough review of the main families of DL models recently developed for tabular data. We carefully tune and evaluate them on a wide range of datasets and reveal two significant findings. First, we show that the choice between GBDT and DL models highly depends on data and there is still no universally superior solution. Second, we demonstrate that a simple ResNet-like architecture is a surprisingly effective baseline, which outperforms most of the sophisticated models from the DL literature. Finally, we design a simple adaptation of the Transformer architecture for tabular data that becomes a new strong DL baseline and reduces the gap between GBDT and DL models on datasets where GBDT dominates.

* Code: https://github.com/yandex-research/rtdl

Via

Access Paper or Ask Questions

Disentangled Representations from Non-Disentangled Models

Feb 11, 2021

Valentin Khrulkov, Leyla Mirvakhabova, Ivan Oseledets, Artem Babenko

Figure 1 for Disentangled Representations from Non-Disentangled Models

Figure 2 for Disentangled Representations from Non-Disentangled Models

Figure 3 for Disentangled Representations from Non-Disentangled Models

Figure 4 for Disentangled Representations from Non-Disentangled Models

Abstract:Constructing disentangled representations is known to be a difficult task, especially in the unsupervised scenario. The dominating paradigm of unsupervised disentanglement is currently to train a generative model that separates different factors of variation in its latent space. This separation is typically enforced by training with specific regularization terms in the model's objective function. These terms, however, introduce additional hyperparameters responsible for the trade-off between disentanglement and generation quality. While tuning these hyperparameters is crucial for proper disentanglement, it is often unclear how to tune them without external supervision. This paper investigates an alternative route to disentangled representations. Namely, we propose to extract such representations from the state-of-the-art generative models trained without disentangling terms in their objectives. This paradigm of post hoc disentanglement employs little or no hyperparameters when learning representations while achieving results on par with existing state-of-the-art, as shown by comparison in terms of established disentanglement metrics, fairness, and the abstract reasoning task. All our code and models are publicly available.

Via

Access Paper or Ask Questions

Functional Space Analysis of Local GAN Convergence

Feb 08, 2021

Valentin Khrulkov, Artem Babenko, Ivan Oseledets

Figure 1 for Functional Space Analysis of Local GAN Convergence

Figure 2 for Functional Space Analysis of Local GAN Convergence

Figure 3 for Functional Space Analysis of Local GAN Convergence

Figure 4 for Functional Space Analysis of Local GAN Convergence

Abstract:Recent work demonstrated the benefits of studying continuous-time dynamics governing the GAN training. However, this dynamics is analyzed in the model parameter space, which results in finite-dimensional dynamical systems. We propose a novel perspective where we study the local dynamics of adversarial training in the general functional space and show how it can be represented as a system of partial differential equations. Thus, the convergence properties can be inferred from the eigenvalues of the resulting differential operator. We show that these eigenvalues can be efficiently estimated from the target dataset before training. Our perspective reveals several insights on the practical tricks commonly used to stabilize GANs, such as gradient penalty, data augmentation, and advanced integration schemes. As an immediate practical benefit, we demonstrate how one can a priori select an optimal data augmentation strategy for a particular generation task.

Via

Access Paper or Ask Questions

Navigating the GAN Parameter Space for Semantic Image Editing

Dec 01, 2020

Anton Cherepkov, Andrey Voynov, Artem Babenko

Figure 1 for Navigating the GAN Parameter Space for Semantic Image Editing

Figure 2 for Navigating the GAN Parameter Space for Semantic Image Editing

Figure 3 for Navigating the GAN Parameter Space for Semantic Image Editing

Figure 4 for Navigating the GAN Parameter Space for Semantic Image Editing

Abstract:Generative Adversarial Networks (GANs) are currently an indispensable tool for visual editing, being a standard component of image-to-image translation and image restoration pipelines. Furthermore, GANs are especially useful for controllable generation since their latent spaces contain a wide range of interpretable directions, well suited for semantic editing operations. By gradually changing latent codes along these directions, one can produce impressive visual effects, unattainable without GANs. In this paper, we significantly expand the range of visual effects achievable with the state-of-the-art models, like StyleGAN2. In contrast to existing works, which mostly operate by latent codes, we discover interpretable directions in the space of the generator parameters. By several simple methods, we explore this space and demonstrate that it also contains a plethora of interpretable directions, which are an excellent source of non-trivial semantic manipulations. The discovered manipulations cannot be achieved by transforming the latent codes and can be used to edit both synthetic and real images. We release our code and models and hope they will serve as a handy tool for further efforts on GAN-based image editing.

* Supplementary code: https://github.com/yandex-research/navigan

Via

Access Paper or Ask Questions

Big GANs Are Watching You: Towards Unsupervised Object Segmentation with Off-the-Shelf Generative Models

Jun 08, 2020

Andrey Voynov, Stanislav Morozov, Artem Babenko

Figure 1 for Big GANs Are Watching You: Towards Unsupervised Object Segmentation with Off-the-Shelf Generative Models

Figure 2 for Big GANs Are Watching You: Towards Unsupervised Object Segmentation with Off-the-Shelf Generative Models

Figure 3 for Big GANs Are Watching You: Towards Unsupervised Object Segmentation with Off-the-Shelf Generative Models

Figure 4 for Big GANs Are Watching You: Towards Unsupervised Object Segmentation with Off-the-Shelf Generative Models

Abstract:Since collecting pixel-level groundtruth data is expensive, unsupervised visual understanding problems are currently an active research topic. In particular, several recent methods based on generative models have achieved promising results for object segmentation and saliency detection. However, since generative models are known to be unstable and sensitive to hyperparameters, the training of these methods can be challenging and time-consuming. In this work, we introduce an alternative, much simpler way to exploit generative models for unsupervised object segmentation. First, we explore the latent space of the BigBiGAN -- the state-of-the-art unsupervised GAN, which parameters are publicly available. We demonstrate that object saliency masks for GAN-produced images can be obtained automatically with BigBiGAN. These masks then are used to train a discriminative segmentation model. Being very simple and easy-to-reproduce, our approach provides competitive performance on common benchmarks in the unsupervised scenario.

Via

Access Paper or Ask Questions

Editable Neural Networks

Apr 01, 2020

Anton Sinitsin, Vsevolod Plokhotnyuk, Sergei Popov, Artem Babenko

Abstract:These days deep neural networks are ubiquitously used in a wide range of tasks, from image classification and machine translation to face identification and self-driving cars. In many applications, a single model error can lead to devastating financial, reputational and even life-threatening consequences. Therefore, it is crucially important to correct model mistakes quickly as they appear. In this work, we investigate the problem of neural network editing $-$ how one can efficiently patch a mistake of the model on a particular sample, without influencing the model behavior on other samples. Namely, we propose Editable Training, a model-agnostic training technique that encourages fast editing of the trained model. We empirically demonstrate the effectiveness of this method on large-scale image classification and machine translation tasks.

Via

Access Paper or Ask Questions

Unsupervised Discovery of Interpretable Directions in the GAN Latent Space

Feb 18, 2020

Andrey Voynov, Artem Babenko

Figure 1 for Unsupervised Discovery of Interpretable Directions in the GAN Latent Space

Figure 2 for Unsupervised Discovery of Interpretable Directions in the GAN Latent Space

Figure 3 for Unsupervised Discovery of Interpretable Directions in the GAN Latent Space

Figure 4 for Unsupervised Discovery of Interpretable Directions in the GAN Latent Space

Abstract:The latent spaces of typical GAN models often have semantically meaningful directions. Moving in these directions corresponds to human-interpretable image transformations, such as zooming or recoloring, enabling a more controllable generation process. However, the discovery of such directions is currently performed in a supervised manner, requiring human labels, pretrained models, or some form of self-supervision. These requirements can severely limit a range of directions existing approaches can discover. In this paper, we introduce an unsupervised method to identify interpretable directions in the latent space of a pretrained GAN model. By a simple model-agnostic procedure, we find directions corresponding to sensible semantic manipulations without any form of (self-)supervision. Furthermore, we reveal several non-trivial findings, which would be difficult to obtain by existing methods, e.g., a direction corresponding to background removal. As an immediate practical benefit of our work, we show how to exploit this finding to achieve a new state-of-the-art for the problem of saliency detection.

Via

Access Paper or Ask Questions

RPGAN: GANs Interpretability via Random Routing

Feb 17, 2020

Andrey Voynov, Artem Babenko

Figure 1 for RPGAN: GANs Interpretability via Random Routing

Figure 2 for RPGAN: GANs Interpretability via Random Routing

Figure 3 for RPGAN: GANs Interpretability via Random Routing

Figure 4 for RPGAN: GANs Interpretability via Random Routing

Abstract:In this paper, we introduce Random Path Generative Adversarial Network (RPGAN) -- an alternative design of GANs that can serve as a tool for generative model analysis. While the latent space of a typical GAN consists of input vectors, randomly sampled from the standard Gaussian distribution, the latent space of RPGAN consists of random paths in a generator network. As we show, this design allows to understand factors of variation, captured by different generator layers, providing their natural interpretability. With experiments on standard benchmarks, we demonstrate that RPGAN reveals several interesting insights about the roles that different layers play in the image generation process. Aside from interpretability, the RPGAN model also provides competitive generation quality and allows efficient incremental learning on new data.

Via

Access Paper or Ask Questions