Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vincent Dumoulin

A guide to convolution arithmetic for deep learning

Jan 11, 2018

Vincent Dumoulin, Francesco Visin

Figure 1 for A guide to convolution arithmetic for deep learning

Figure 2 for A guide to convolution arithmetic for deep learning

Figure 3 for A guide to convolution arithmetic for deep learning

Figure 4 for A guide to convolution arithmetic for deep learning

Abstract:We introduce a guide to help deep learning practitioners understand and manipulate convolutional neural network architectures. The guide clarifies the relationship between various properties (input shape, kernel shape, zero padding, strides and output shape) of convolutional, pooling and transposed convolutional layers, as well as the relationship between convolutional and transposed convolutional layers. Relationships are derived for various cases, and are illustrated in order to make them intuitive.

Via

Access Paper or Ask Questions

Improved Training of Wasserstein GANs

Dec 25, 2017

Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, Aaron Courville

Figure 1 for Improved Training of Wasserstein GANs

Figure 2 for Improved Training of Wasserstein GANs

Figure 3 for Improved Training of Wasserstein GANs

Figure 4 for Improved Training of Wasserstein GANs

Abstract:Generative Adversarial Networks (GANs) are powerful generative models, but suffer from training instability. The recently proposed Wasserstein GAN (WGAN) makes progress toward stable training of GANs, but sometimes can still generate only low-quality samples or fail to converge. We find that these problems are often due to the use of weight clipping in WGAN to enforce a Lipschitz constraint on the critic, which can lead to undesired behavior. We propose an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input. Our proposed method performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning, including 101-layer ResNets and language models over discrete data. We also achieve high quality generations on CIFAR-10 and LSUN bedrooms.

* NIPS camera-ready

Via

Access Paper or Ask Questions

Learning Visual Reasoning Without Strong Priors

Dec 18, 2017

Ethan Perez, Harm de Vries, Florian Strub, Vincent Dumoulin, Aaron Courville

Figure 1 for Learning Visual Reasoning Without Strong Priors

Figure 2 for Learning Visual Reasoning Without Strong Priors

Figure 3 for Learning Visual Reasoning Without Strong Priors

Figure 4 for Learning Visual Reasoning Without Strong Priors

Abstract:Achieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an important step towards artificial general intelligence. This multi-modal task requires learning a question-dependent, structured reasoning process over images from language. Standard deep learning approaches tend to exploit biases in the data rather than learn this underlying structure, while leading methods learn to visually reason successfully but are hand-crafted for reasoning. We show that a general-purpose, Conditional Batch Normalization approach achieves state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4% error rate. We outperform the next best end-to-end method (4.5%) and even methods that use extra supervision (3.1%). We probe our model to shed light on how it reasons, showing it has learned a question-dependent, multi-step process. Previous work has operated under the assumption that visual reasoning calls for a specialized architecture, but we show that a general architecture with proper conditioning can learn to visually reason effectively.

* Full AAAI 2018 paper is at arXiv:1709.07871. Presented at ICML 2017's Machine Learning in Speech and Language Processing Workshop. Code is at http://github.com/ethanjperez/film

Via

Access Paper or Ask Questions

FiLM: Visual Reasoning with a General Conditioning Layer

Dec 18, 2017

Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, Aaron Courville

Figure 1 for FiLM: Visual Reasoning with a General Conditioning Layer

Figure 2 for FiLM: Visual Reasoning with a General Conditioning Layer

Figure 3 for FiLM: Visual Reasoning with a General Conditioning Layer

Figure 4 for FiLM: Visual Reasoning with a General Conditioning Layer

Abstract:We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.

* AAAI 2018. Code available at http://github.com/ethanjperez/film . Extends arXiv:1707.03017

Via

Access Paper or Ask Questions

Generative Adversarial Networks: An Overview

Oct 19, 2017

Antonia Creswell, Tom White, Vincent Dumoulin, Kai Arulkumaran, Biswa Sengupta, Anil A Bharath

Figure 1 for Generative Adversarial Networks: An Overview

Figure 2 for Generative Adversarial Networks: An Overview

Figure 3 for Generative Adversarial Networks: An Overview

Figure 4 for Generative Adversarial Networks: An Overview

Abstract:Generative adversarial networks (GANs) provide a way to learn deep representations without extensively annotated training data. They achieve this through deriving backpropagation signals through a competitive process involving a pair of networks. The representations that can be learned by GANs may be used in a variety of applications, including image synthesis, semantic image editing, style transfer, image super-resolution and classification. The aim of this review paper is to provide an overview of GANs for the signal processing community, drawing on familiar analogies and concepts where possible. In addition to identifying different methods for training and constructing GANs, we also point to remaining challenges in their theory and application.

* Accepted in the IEEE Signal Processing Magazine Special Issue on Deep Learning for Visual Understanding

Via

Access Paper or Ask Questions

Exploring the structure of a real-time, arbitrary neural artistic stylization network

Aug 24, 2017

Golnaz Ghiasi, Honglak Lee, Manjunath Kudlur, Vincent Dumoulin, Jonathon Shlens

Figure 1 for Exploring the structure of a real-time, arbitrary neural artistic stylization network

Figure 2 for Exploring the structure of a real-time, arbitrary neural artistic stylization network

Figure 3 for Exploring the structure of a real-time, arbitrary neural artistic stylization network

Figure 4 for Exploring the structure of a real-time, arbitrary neural artistic stylization network

Abstract:In this paper, we present a method which combines the flexibility of the neural algorithm of artistic style with the speed of fast style transfer networks to allow real-time stylization using any content/style image pair. We build upon recent work leveraging conditional instance normalization for multi-style transfer networks by learning to predict the conditional instance normalization parameters directly from a style image. The model is successfully trained on a corpus of roughly 80,000 paintings and is able to generalize to paintings previously unobserved. We demonstrate that the learned embedding space is smooth and contains a rich structure and organizes semantic information associated with paintings in an entirely unsupervised manner.

* Accepted as an oral presentation at British Machine Vision Conference (BMVC) 2017

Via

Access Paper or Ask Questions

Adversarially Learned Inference

Feb 21, 2017

Vincent Dumoulin, Ishmael Belghazi, Ben Poole, Olivier Mastropietro, Alex Lamb, Martin Arjovsky, Aaron Courville

Figure 1 for Adversarially Learned Inference

Figure 2 for Adversarially Learned Inference

Figure 3 for Adversarially Learned Inference

Figure 4 for Adversarially Learned Inference

Abstract:We introduce the adversarially learned inference (ALI) model, which jointly learns a generation network and an inference network using an adversarial process. The generation network maps samples from stochastic latent variables to the data space while the inference network maps training examples in data space to the space of latent variables. An adversarial game is cast between these two networks and a discriminative network is trained to distinguish between joint latent/data-space samples from the generative network and joint samples from the inference network. We illustrate the ability of the model to learn mutually coherent inference and generation networks through the inspections of model samples and reconstructions and confirm the usefulness of the learned representations by obtaining a performance competitive with state-of-the-art on the semi-supervised SVHN and CIFAR10 tasks.

Via

Access Paper or Ask Questions

A Learned Representation For Artistic Style

Feb 09, 2017

Vincent Dumoulin, Jonathon Shlens, Manjunath Kudlur

Figure 1 for A Learned Representation For Artistic Style

Figure 2 for A Learned Representation For Artistic Style

Figure 3 for A Learned Representation For Artistic Style

Figure 4 for A Learned Representation For Artistic Style

Abstract:The diversity of painting styles represents a rich visual vocabulary for the construction of an image. The degree to which one may learn and parsimoniously capture this visual vocabulary measures our understanding of the higher level features of paintings, if not images in general. In this work we investigate the construction of a single, scalable deep network that can parsimoniously capture the artistic style of a diversity of paintings. We demonstrate that such a network generalizes across a diversity of artistic styles by reducing a painting to a point in an embedding space. Importantly, this model permits a user to explore new painting styles by arbitrarily combining the styles learned from individual paintings. We hope that this work provides a useful step towards building rich models of paintings and offers a window on to the structure of the learned representation of artistic style.

* 9 pages. 15 pages of Appendix, International Conference on Learning Representations (ICLR) 2017

Via

Access Paper or Ask Questions

Theano: A Python framework for fast computation of mathematical expressions

May 09, 2016

The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov(+103 more)

Figure 1 for Theano: A Python framework for fast computation of mathematical expressions

Figure 2 for Theano: A Python framework for fast computation of mathematical expressions

Figure 3 for Theano: A Python framework for fast computation of mathematical expressions

Figure 4 for Theano: A Python framework for fast computation of mathematical expressions

Abstract:Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.

* 19 pages, 5 figures

Via

Access Paper or Ask Questions

Discriminative Regularization for Generative Models

Feb 15, 2016

Alex Lamb, Vincent Dumoulin, Aaron Courville

Figure 1 for Discriminative Regularization for Generative Models

Figure 2 for Discriminative Regularization for Generative Models

Figure 3 for Discriminative Regularization for Generative Models

Figure 4 for Discriminative Regularization for Generative Models

Abstract:We explore the question of whether the representations learned by classifiers can be used to enhance the quality of generative models. Our conjecture is that labels correspond to characteristics of natural data which are most salient to humans: identity in faces, objects in images, and utterances in speech. We propose to take advantage of this by using the representations from discriminative classifiers to augment the objective function corresponding to a generative model. In particular we enhance the objective function of the variational autoencoder, a popular generative model, with a discriminative regularization term. We show that enhancing the objective function in this way leads to samples that are clearer and have higher visual quality than the samples from the standard variational autoencoders.

Via

Access Paper or Ask Questions