Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lior Wolf

Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample

Jun 23, 2020
Shir Gur, Sagie Benaim, Lior Wolf

Figure 1 for Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample

Figure 2 for Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample

Figure 3 for Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample

Figure 4 for Hierarchical Patch VAE-GAN: Generating Diverse Videos from a Single Sample

We consider the task of generating diverse and novel videos from a single video sample. Recently, new hierarchical patch-GAN based approaches were proposed for generating diverse images, given only a single sample at training time. Moving to videos, these approaches fail to generate diverse samples, and often collapse into generating samples similar to the training video. We introduce a novel patch-based variational autoencoder (VAE) which allows for a much greater diversity in generation. Using this tool, a new hierarchical video generation scheme is constructed: at coarse scales, our patch-VAE is employed, ensuring samples are of high diversity. Subsequently, at finer scales, a patch-GAN renders the fine details, resulting in high quality videos. Our experiments show that the proposed method produces diverse samples in both the image domain, and the more challenging video domain.

Via

Access Paper or Ask Questions

Wish You Were Here: Context-Aware Human Generation

May 21, 2020
Oran Gafni, Lior Wolf

Figure 1 for Wish You Were Here: Context-Aware Human Generation

Figure 2 for Wish You Were Here: Context-Aware Human Generation

Figure 3 for Wish You Were Here: Context-Aware Human Generation

Figure 4 for Wish You Were Here: Context-Aware Human Generation

We present a novel method for inserting objects, specifically humans, into existing images, such that they blend in a photorealistic manner, while respecting the semantic context of the scene. Our method involves three subnetworks: the first generates the semantic map of the new person, given the pose of the other persons in the scene and an optional bounding box specification. The second network renders the pixels of the novel person and its blending mask, based on specifications in the form of multiple appearance components. A third network refines the generated face in order to match those of the target person. Our experiments present convincing high-resolution outputs in this novel and challenging application domain. In addition, the three networks are evaluated individually, demonstrating for example, state of the art results in pose transfer benchmarks.

Via

Access Paper or Ask Questions

Evaluation Metrics for Conditional Image Generation

Apr 26, 2020
Yaniv Benny, Tomer Galanti, Sagie Benaim, Lior Wolf

Figure 1 for Evaluation Metrics for Conditional Image Generation

Figure 2 for Evaluation Metrics for Conditional Image Generation

Figure 3 for Evaluation Metrics for Conditional Image Generation

Figure 4 for Evaluation Metrics for Conditional Image Generation

We present two new metrics for evaluating generative models in the class-conditional image generation setting. These metrics are obtained by generalizing the two most popular unconditional metrics: the Inception Score (IS) and the Fr\'{e}chet Inception Distance (FID). A theoretical analysis shows the motivation behind each proposed metric and links the novel metrics to their unconditional counterparts. The link takes the form of a product in the case of IS or an upper bound in the FID case. We provide an extensive empirical evaluation, comparing the metrics to their unconditional variants and to other metrics, and utilize them to analyze existing generative models, thus providing additional insights about their performance, from unlearned classes to mode collapse.

Via

Access Paper or Ask Questions

Structural-analogy from a Single Image Pair

Apr 16, 2020
Sagie Benaim, Ron Mokady, Amit Bermano, Daniel Cohen-Or, Lior Wolf

Figure 1 for Structural-analogy from a Single Image Pair

Figure 2 for Structural-analogy from a Single Image Pair

Figure 3 for Structural-analogy from a Single Image Pair

Figure 4 for Structural-analogy from a Single Image Pair

The task of unsupervised image-to-image translation has seen substantial advancements in recent years through the use of deep neural networks. Typically, the proposed solutions learn the characterizing distribution of two large, unpaired collections of images, and are able to alter the appearance of a given image, while keeping its geometry intact. In this paper, we explore the capabilities of neural networks to understand image structure given only a single pair of images, A and B. We seek to generate images that are structurally aligned: that is, to generate an image that keeps the appearance and style of B, but has a structural arrangement that corresponds to A. The key idea is to map between image patches at different scales. This enables controlling the granularity at which analogies are produced, which determines the conceptual distinction between style and content. In addition to structural alignment, our method can be used to generate high quality imagery in other conditional generation tasks utilizing images A and B only: guided image synthesis, style and texture transfer, text translation as well as video translation. Our code and additional results are available in https://github.com/rmokady/structural-analogy/.

Via

Access Paper or Ask Questions

On the Optimization Dynamics of Wide Hypernetworks

Apr 05, 2020
Etai Littwin, Tomer Galanti, Lior Wolf

Recent results in the theoretical study of deep learning have shown that the optimization dynamics of wide neural networks exhibit a surprisingly simple behaviour. In this work, we study the optimization dynamics of hypernetworks, which are architectures in which a learned meta-network produces the weights of a task-specific primary network. Hypernetworks have been demonstrated repeatedly to obtain state of the art results. However, their theoretical understanding is still lacking. As can be expected, the optimization process of multiplicative models is much more complicated than optimizing standard ReLU networks. It is shown that for an infinitely wide neural network with a gating layer the cost function cannot be accurately approximated by it first order Taylor approximation. Specifically, for a fixed sized primary network of depth H, the first H terms of the Taylor approximation of the cost function are non-zero, even when the meta-network is infinitely wide. However, for an infinitely wide meta and primary networks, the learning dynamics is determined by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters and the kernel of this process is given by the Hadamard product of the kernels induced by the meta and primary networks. As part of our study, we partially solve an open problem suggested by Dyer & Gur-Ari (2020) and show that the convergence rate of the r order term of the Taylor expansion of the cost function, along the optimization trajectories of SGD is n^{1-r}, where n is the width of the learned neural network, improving upon the n^{-1} bound suggested by the conjecture of Dyer & Gur-Ari, while matching their empirical observations.

* The first two authors contributed equally

Via

Access Paper or Ask Questions

Voice Separation with an Unknown Number of Multiple Speakers

Feb 29, 2020
Eliya Nachmani, Yossi Adi, Lior Wolf

Figure 1 for Voice Separation with an Unknown Number of Multiple Speakers

Figure 2 for Voice Separation with an Unknown Number of Multiple Speakers

Figure 3 for Voice Separation with an Unknown Number of Multiple Speakers

Figure 4 for Voice Separation with an Unknown Number of Multiple Speakers

We present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and a the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.

Via

Access Paper or Ask Questions

ScopeFlow: Dynamic Scene Scoping for Optical Flow

Feb 25, 2020
Aviram Bar-Haim, Lior Wolf

Figure 1 for ScopeFlow: Dynamic Scene Scoping for Optical Flow

Figure 2 for ScopeFlow: Dynamic Scene Scoping for Optical Flow

Figure 3 for ScopeFlow: Dynamic Scene Scoping for Optical Flow

Figure 4 for ScopeFlow: Dynamic Scene Scoping for Optical Flow

We propose to modify the common training protocols of optical flow, leading to sizable accuracy improvements without adding to the computational complexity of the training process. The improvement is based on observing the bias in sampling challenging data that exists in the current training protocol, and improving the sampling process. In addition, we find that both regularization and augmentation should decrease during the training protocol. Using a low parameters off-the-shelf model, the method is ranked first on the MPI Sintel benchmark among all other methods, improving the best two frames method accuracy by more than 10%. The method also surpasses all similar architecture variants by more than 12% and 19.7% on the KITTI benchmarks, achieving the lowest Average End-Point Error on KITTI2012 among two-frame methods, without using extra datasets.

Via

Access Paper or Ask Questions

A Critical View of the Structural Causal Model

Feb 23, 2020
Tomer Galanti, Ofir Nabati, Lior Wolf

Figure 1 for A Critical View of the Structural Causal Model

Figure 2 for A Critical View of the Structural Causal Model

Figure 3 for A Critical View of the Structural Causal Model

Figure 4 for A Critical View of the Structural Causal Model

In the univariate case, we show that by comparing the individual complexities of univariate cause and effect, one can identify the cause and the effect, without considering their interaction at all. In our framework, complexities are captured by the reconstruction error of an autoencoder that operates on the quantiles of the distribution. Comparing the reconstruction errors of the two autoencoders, one for each variable, is shown to perform surprisingly well on the accepted causality directionality benchmarks. Hence, the decision as to which of the two is the cause and which is the effect may not be based on causality but on complexity. In the multivariate case, where one can ensure that the complexities of the cause and effect are balanced, we propose a new adversarial training method that mimics the disentangled structure of the causal model. We prove that in the multidimensional case, such modeling is likely to fit the data only in the direction of causality. Furthermore, a uniqueness result shows that the learned model is able to identify the underlying causal and residual (noise) components. Our multidimensional method outperforms the literature methods on both synthetic and real world datasets.

Via

Access Paper or Ask Questions

Comparing the Parameter Complexity of Hypernetworks and the Embedding-Based Alternative

Feb 23, 2020
Tomer Galanti, Lior Wolf

Figure 1 for Comparing the Parameter Complexity of Hypernetworks and the Embedding-Based Alternative

Figure 2 for Comparing the Parameter Complexity of Hypernetworks and the Embedding-Based Alternative

Figure 3 for Comparing the Parameter Complexity of Hypernetworks and the Embedding-Based Alternative

In the context of learning to map an input $I$ to a function $h_I:\mathcal{X}\to \mathbb{R}$, we compare two alternative methods: (i) an embedding-based method, which learns a fixed function in which $I$ is encoded as a conditioning signal $e(I)$ and the learned function takes the form $h_I(x) = q(x,e(I))$, and (ii) hypernetworks, in which the weights $\theta_I$ of the function $h_I(x) = g(x;\theta_I)$ are given by a hypernetwork $f$ as $\theta_I=f(I)$. We extend the theory of~\cite{devore} and provide a lower bound on the complexity of neural networks as function approximators, i.e., the number of trainable parameters. This extension, eliminates the requirements for the approximation method to be robust. Our results are then used to compare the complexities of $q$ and $g$, showing that under certain conditions and when letting the functions $e$ and $f$ be as large as we wish, $g$ can be smaller than $q$ by orders of magnitude. In addition, we show that for typical assumptions on the function to be approximated, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.

Via

Access Paper or Ask Questions

Residual Tangent Kernels

Feb 18, 2020
Etai Littwin, Lior Wolf

A recent body of work has focused on the theoretical study of neural networks at the regime of large width. Specifically, it was shown that training infinitely-wide and properly scaled vanilla ReLU networks using the L2 loss, is equivalent to kernel regression using the Neural Tangent Kernel (NTK), which is deterministic, and remains constant during training. In this work, we derive the form of the limiting kernel for architectures incorporating bypass connections, namely residual networks (ResNets), as well as to densely connected networks (DenseNets). In addition, we derive finite width and depth corrections for both cases. Our analysis reveals that deep practical residual architectures might operate much closer to the ``kernel regime'' than their vanilla counterparts: while in networks that do not use skip connections, convergence to the NTK requires one to fix depth, while increasing the layers' width. Our findings show that in ResNets, convergence to the NTK may occur when depth and width simultaneously tend to infinity, provided proper initialization. In DenseNets, however, convergence to the NTK as the width tend to infinity is guaranteed, at a rate that is independent of both depth and scale of the weights.

Via

Access Paper or Ask Questions