Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Youssef Mroueh

IBM Research, USA

Wasserstein Barycenter Model Ensembling

Feb 13, 2019

Pierre Dognin, Igor Melnyk, Youssef Mroueh, Jerret Ross, Cicero Dos Santos, Tom Sercu

Figure 1 for Wasserstein Barycenter Model Ensembling

Figure 2 for Wasserstein Barycenter Model Ensembling

Figure 3 for Wasserstein Barycenter Model Ensembling

Figure 4 for Wasserstein Barycenter Model Ensembling

Abstract:In this paper we propose to perform model ensembling in a multiclass or a multilabel learning setting using Wasserstein (W.) barycenters. Optimal transport metrics, such as the Wasserstein distance, allow incorporating semantic side information such as word embeddings. Using W. barycenters to find the consensus between models allows us to balance confidence and semantics in finding the agreement between the models. We show applications of Wasserstein ensembling in attribute-based classification, multilabel learning and image captioning generation. These results show that the W. ensembling is a viable alternative to the basic geometric or arithmetic mean ensembling.

* ICLR 2019

Via

Access Paper or Ask Questions

Improved Image Captioning with Adversarial Semantic Alignment

Jun 01, 2018

Pierre L. Dognin, Igor Melnyk, Youssef Mroueh, Jarret Ross, Tom Sercu

Figure 1 for Improved Image Captioning with Adversarial Semantic Alignment

Figure 2 for Improved Image Captioning with Adversarial Semantic Alignment

Figure 3 for Improved Image Captioning with Adversarial Semantic Alignment

Figure 4 for Improved Image Captioning with Adversarial Semantic Alignment

Abstract:We study image captioning as a conditional GAN training, proposing both a context-aware LSTM captioner and co-attentive discriminator, which enforces semantic alignment between images and captions. We empirically study the viability of two training methods: Self-critical Sequence Training (SCST) and Gumbel Straight-Through (ST). We show that, surprisingly, SCST (a policy gradient method) shows more stable gradient behavior and improved results over Gumbel ST, even without accessing the discriminator gradients directly. We also address the open question of automatic evaluation for these models and introduce a new semantic score and demonstrate its strong correlation to human judgement. As an evaluation paradigm, we suggest that an important criterion is the ability of a captioner to generalize to compositions between objects that do not usually occur together, for which we introduce a captioned Out of Context (OOC) test set. The OOC dataset combined with our semantic score is a new benchmark for the captioning community. Under this OOC benchmark, and the traditional MSCOCO dataset, we show that SCST has a strong performance in both semantic score and human evaluation.

* Authors Equal Contribution

Via

Access Paper or Ask Questions

Regularized Kernel and Neural Sobolev Descent: Dynamic MMD Transport

May 30, 2018

Youssef Mroueh, Tom Sercu, Anant Raj

Figure 1 for Regularized Kernel and Neural Sobolev Descent: Dynamic MMD Transport

Figure 2 for Regularized Kernel and Neural Sobolev Descent: Dynamic MMD Transport

Figure 3 for Regularized Kernel and Neural Sobolev Descent: Dynamic MMD Transport

Figure 4 for Regularized Kernel and Neural Sobolev Descent: Dynamic MMD Transport

Abstract:We introduce Regularized Kernel and Neural Sobolev Descent for transporting a source distribution to a target distribution along smooth paths of minimum kinetic energy (defined by the Sobolev discrepancy), related to dynamic optimal transport. In the kernel version, we give a simple algorithm to perform the descent along gradients of the Sobolev critic, and show that it converges asymptotically to the target distribution in the MMD sense. In the neural version, we parametrize the Sobolev critic with a neural network with input gradient norm constrained in expectation. We show in theory and experiments that regularization has an important role in favoring smooth transitions between distributions, avoiding large discrete jumps. Our analysis could provide a new perspective on the impact of critic updates (early stopping) on the paths to equilibrium in the GAN setting.

Via

Access Paper or Ask Questions

Regularized Finite Dimensional Kernel Sobolev Discrepancy

May 16, 2018

Youssef Mroueh

Abstract:We show in this note that the Sobolev Discrepancy introduced in Mroueh et al in the context of generative adversarial networks, is actually the weighted negative Sobolev norm $||.||_{\dot{H}^{-1}(\nu_q)}$, that is known to linearize the Wasserstein $W_2$ distance and plays a fundamental role in the dynamic formulation of optimal transport of Benamou and Brenier. Given a Kernel with finite dimensional feature map we show that the Sobolev discrepancy can be approximated from finite samples. Assuming this discrepancy is finite, the error depends on the approximation error in the function space induced by the finite dimensional feature space kernel and on a statistical error due to the finite sample approximation.

Via

Access Paper or Ask Questions

Semi-Supervised Learning with IPM-based GANs: an Empirical Study

Dec 07, 2017

Tom Sercu, Youssef Mroueh

Figure 1 for Semi-Supervised Learning with IPM-based GANs: an Empirical Study

Figure 2 for Semi-Supervised Learning with IPM-based GANs: an Empirical Study

Figure 3 for Semi-Supervised Learning with IPM-based GANs: an Empirical Study

Figure 4 for Semi-Supervised Learning with IPM-based GANs: an Empirical Study

Abstract:We present an empirical investigation of a recent class of Generative Adversarial Networks (GANs) using Integral Probability Metrics (IPM) and their performance for semi-supervised learning. IPM-based GANs like Wasserstein GAN, Fisher GAN and Sobolev GAN have desirable properties in terms of theoretical understanding, training stability, and a meaningful loss. In this work we investigate how the design of the critic (or discriminator) influences the performance in semi-supervised learning. We distill three key take-aways which are important for good SSL performance: (1) the K+1 formulation, (2) avoiding batch normalization in the critic and (3) avoiding gradient penalty constraints on the classification layer.

* Appeared at NIPS 2017 Workshop: Deep Learning: Bridging Theory and Practice

Via

Access Paper or Ask Questions

Self-critical Sequence Training for Image Captioning

Nov 16, 2017

Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, Vaibhava Goel

Figure 1 for Self-critical Sequence Training for Image Captioning

Figure 2 for Self-critical Sequence Training for Image Captioning

Figure 3 for Self-critical Sequence Training for Image Captioning

Figure 4 for Self-critical Sequence Training for Image Captioning

Abstract:Recently it has been shown that policy-gradient methods for reinforcement learning can be utilized to train deep end-to-end systems directly on non-differentiable metrics for the task at hand. In this paper we consider the problem of optimizing image captioning systems using reinforcement learning, and show that by carefully optimizing our systems using the test metrics of the MSCOCO task, significant gains in performance can be realized. Our systems are built using a new optimization approach that we call self-critical sequence training (SCST). SCST is a form of the popular REINFORCE algorithm that, rather than estimating a "baseline" to normalize the rewards and reduce variance, utilizes the output of its own test-time inference algorithm to normalize the rewards it experiences. Using this approach, estimating the reward signal (as actor-critic methods must do) and estimating normalization (as REINFORCE algorithms typically do) is avoided, while at the same time harmonizing the model with respect to its test-time inference procedure. Empirically we find that directly optimizing the CIDEr metric with SCST and greedy decoding at test-time is highly effective. Our results on the MSCOCO evaluation sever establish a new state-of-the-art on the task, improving the best result in terms of CIDEr from 104.9 to 114.7.

* CVPR 2017 + additional analysis + fixed baseline results, 16 pages

Via

Access Paper or Ask Questions

Sobolev GAN

Nov 14, 2017

Youssef Mroueh, Chun-Liang Li, Tom Sercu, Anant Raj, Yu Cheng

Abstract:We propose a new Integral Probability Metric (IPM) between distributions: the Sobolev IPM. The Sobolev IPM compares the mean discrepancy of two distributions for functions (critic) restricted to a Sobolev ball defined with respect to a dominant measure $\mu$. We show that the Sobolev IPM compares two distributions in high dimensions based on weighted conditional Cumulative Distribution Functions (CDF) of each coordinate on a leave one out basis. The Dominant measure $\mu$ plays a crucial role as it defines the support on which conditional CDFs are compared. Sobolev IPM can be seen as an extension of the one dimensional Von-Mises Cram\'er statistics to high dimensional distributions. We show how Sobolev IPM can be used to train Generative Adversarial Networks (GANs). We then exploit the intrinsic conditioning implied by Sobolev IPM in text generation. Finally we show that a variant of Sobolev GAN achieves competitive results in semi-supervised learning on CIFAR-10, thanks to the smoothness enforced on the critic by Sobolev GAN which relates to Laplacian regularization.

Via

Access Paper or Ask Questions

Fisher GAN

Nov 03, 2017

Youssef Mroueh, Tom Sercu

Abstract:Generative Adversarial Networks (GANs) are powerful models for learning complex distributions. Stable training of GANs has been addressed in many recent works which explore different metrics between distributions. In this paper we introduce Fisher GAN which fits within the Integral Probability Metrics (IPM) framework for training GANs. Fisher GAN defines a critic with a data dependent constraint on its second order moments. We show in this paper that Fisher GAN allows for stable and time efficient training that does not compromise the capacity of the critic, and does not need data independent constraints such as weight clipping. We analyze our Fisher IPM theoretically and provide an algorithm based on Augmented Lagrangian for Fisher GAN. We validate our claims on both image sample generation and semi-supervised classification using Fisher GAN.

* Published at NIPS 2017. v2: added inception score table & plot update, relation to f-gan, illustration (Figure 1). v3: added strong SSL results for critic without batch normalization

Via

Access Paper or Ask Questions

McGan: Mean and Covariance Feature Matching GAN

Jun 08, 2017

Youssef Mroueh, Tom Sercu, Vaibhava Goel

Figure 1 for McGan: Mean and Covariance Feature Matching GAN

Figure 2 for McGan: Mean and Covariance Feature Matching GAN

Figure 3 for McGan: Mean and Covariance Feature Matching GAN

Figure 4 for McGan: Mean and Covariance Feature Matching GAN

Abstract:We introduce new families of Integral Probability Metrics (IPM) for training Generative Adversarial Networks (GAN). Our IPMs are based on matching statistics of distributions embedded in a finite dimensional feature space. Mean and covariance feature matching IPMs allow for stable training of GANs, which we will call McGan. McGan minimizes a meaningful loss between distributions.

* 15 pages; published at ICML 2017

Via

Access Paper or Ask Questions

Local Group Invariant Representations via Orbit Embeddings

May 24, 2017

Anant Raj, Abhishek Kumar, Youssef Mroueh, P. Thomas Fletcher, Bernhard Schölkopf

Figure 1 for Local Group Invariant Representations via Orbit Embeddings

Figure 2 for Local Group Invariant Representations via Orbit Embeddings

Figure 3 for Local Group Invariant Representations via Orbit Embeddings

Figure 4 for Local Group Invariant Representations via Orbit Embeddings

Abstract:Invariance to nuisance transformations is one of the desirable properties of effective representations. We consider transformations that form a \emph{group} and propose an approach based on kernel methods to derive local group invariant representations. Locality is achieved by defining a suitable probability distribution over the group which in turn induces distributions in the input feature space. We learn a decision function over these distributions by appealing to the powerful framework of kernel methods and generate local invariant random feature maps via kernel approximations. We show uniform convergence bounds for kernel approximation and provide excess risk bounds for learning with these features. We evaluate our method on three real datasets, including Rotated MNIST and CIFAR-10, and observe that it outperforms competing kernel based approaches. The proposed method also outperforms deep CNN on Rotated-MNIST and performs comparably to the recently proposed group-equivariant CNN.

* AISTATS 2017 accepted version including appendix, 18 pages, 1 figure

Via

Access Paper or Ask Questions