Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Armand Joulin

INRIA - Ecole Normale Superieure

Leveraging Large-Scale Uncurated Data for Unsupervised Pre-training of Visual Features

May 03, 2019

Mathilde Caron, Piotr Bojanowski, Julien Mairal, Armand Joulin

Figure 1 for Leveraging Large-Scale Uncurated Data for Unsupervised Pre-training of Visual Features

Figure 2 for Leveraging Large-Scale Uncurated Data for Unsupervised Pre-training of Visual Features

Figure 3 for Leveraging Large-Scale Uncurated Data for Unsupervised Pre-training of Visual Features

Figure 4 for Leveraging Large-Scale Uncurated Data for Unsupervised Pre-training of Visual Features

Abstract:Pre-training general-purpose visual features with convolutional neural networks without relying on annotations is a challenging and important task. Most recent efforts in unsupervised feature learning have focused on either small or highly curated datasets like ImageNet, whereas using uncurated raw datasets was found to decrease the feature quality when evaluated on a transfer task. Our goal is to bridge the performance gap between unsupervised methods trained on curated data, which are costly to obtain, and massive raw datasets that are easily available. To that effect, we propose a new unsupervised approach which leverages self-supervision and clustering to capture complementary statistics from large-scale data. We validate our approach on 96 million images from YFCC100M, achieving state-of-the-art results among unsupervised methods on standard benchmarks, which confirms the potential of unsupervised learning when only uncurated data are available. We also show that pre-training a supervised VGG-16 with our method achieves 74.6% top-1 accuracy on the validation set of ImageNet classification, which is an improvement of +0.7% over the same network trained from scratch.

Via

Access Paper or Ask Questions

Cooperative Learning of Disjoint Syntax and Semantics

Feb 25, 2019

Serhii Havrylov, Germán Kruszewski, Armand Joulin

Figure 1 for Cooperative Learning of Disjoint Syntax and Semantics

Figure 2 for Cooperative Learning of Disjoint Syntax and Semantics

Figure 3 for Cooperative Learning of Disjoint Syntax and Semantics

Figure 4 for Cooperative Learning of Disjoint Syntax and Semantics

Abstract:There has been considerable attention devoted to models that learn to jointly infer an expression's syntactic structure and its semantics. Yet, \citet{NangiaB18} has recently shown that the current best systems fail to learn the correct parsing strategy on mathematical expressions generated from a simple context-free grammar. In this work, we present a recursive model inspired by \newcite{ChoiYL18} that reaches near perfect accuracy on this task. Our model is composed of two separated modules for syntax and semantics. They are cooperatively trained with standard continuous and discrete optimization schemes. Our model does not require any linguistic structure for supervision and its recursive nature allows for out-of-domain generalization with little loss in performance. Additionally, our approach performs competitively on several natural language tasks, such as Natural Language Inference or Sentiment Analysis.

* The paper was accepted at NAACL-HLT 2019

Via

Access Paper or Ask Questions

Unsupervised Hyperalignment for Multilingual Word Embeddings

Nov 02, 2018

Jean Alaux, Edouard Grave, Marco Cuturi, Armand Joulin

Figure 1 for Unsupervised Hyperalignment for Multilingual Word Embeddings

Figure 2 for Unsupervised Hyperalignment for Multilingual Word Embeddings

Figure 3 for Unsupervised Hyperalignment for Multilingual Word Embeddings

Figure 4 for Unsupervised Hyperalignment for Multilingual Word Embeddings

Abstract:We consider the problem of aligning continuous word representations, learned in multiple languages, to a common space. It was recently shown that, in the case of two languages, it is possible to learn such a mapping without supervision. This paper extends this line of work to the problem of aligning multiple languages to a common space. A solution is to independently map all languages to a pivot language. Unfortunately, this degrades the quality of indirect word translation. We thus propose a novel formulation that ensures composable mappings, leading to better alignments. We evaluate our method by jointly aligning word vectors in eleven languages, showing consistent improvement with indirect mappings while maintaining competitive performance on direct word translation.

* Submitted to ICLR

Via

Access Paper or Ask Questions

Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion

Sep 05, 2018

Armand Joulin, Piotr Bojanowski, Tomas Mikolov, Herve Jegou, Edouard Grave

Figure 1 for Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion

Figure 2 for Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion

Figure 3 for Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion

Figure 4 for Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion

Abstract:Continuous word representations learned separately on distinct languages can be aligned so that their words become comparable in a common space. Existing works typically solve a least-square regression problem to learn a rotation aligning a small bilingual lexicon, and use a retrieval criterion for inference. In this paper, we propose an unified formulation that directly optimizes a retrieval criterion in an end-to-end fashion. Our experiments on standard benchmarks show that our approach outperforms the state of the art on word translation, with the biggest improvements observed for distant language pairs such as English-Chinese.

Via

Access Paper or Ask Questions

Deep Clustering for Unsupervised Learning of Visual Features

Jul 15, 2018

Mathilde Caron, Piotr Bojanowski, Armand Joulin, Matthijs Douze

Figure 1 for Deep Clustering for Unsupervised Learning of Visual Features

Figure 2 for Deep Clustering for Unsupervised Learning of Visual Features

Figure 3 for Deep Clustering for Unsupervised Learning of Visual Features

Figure 4 for Deep Clustering for Unsupervised Learning of Visual Features

Abstract:Clustering is a class of unsupervised learning methods that has been extensively applied and studied in computer vision. Little work has been done to adapt it to the end-to-end training of visual features on large scale datasets. In this work, we present DeepCluster, a clustering method that jointly learns the parameters of a neural network and the cluster assignments of the resulting features. DeepCluster iteratively groups the features with a standard clustering algorithm, k-means, and uses the subsequent assignments as supervision to update the weights of the network. We apply DeepCluster to the unsupervised training of convolutional neural networks on large datasets like ImageNet and YFCC100M. The resulting model outperforms the current state of the art by a significant margin on all the standard benchmarks.

* Accepted at ECCV 2018

Via

Access Paper or Ask Questions

Unsupervised Alignment of Embeddings with Wasserstein Procrustes

May 29, 2018

Edouard Grave, Armand Joulin, Quentin Berthet

Figure 1 for Unsupervised Alignment of Embeddings with Wasserstein Procrustes

Figure 2 for Unsupervised Alignment of Embeddings with Wasserstein Procrustes

Figure 3 for Unsupervised Alignment of Embeddings with Wasserstein Procrustes

Figure 4 for Unsupervised Alignment of Embeddings with Wasserstein Procrustes

Abstract:We consider the task of aligning two sets of points in high dimension, which has many applications in natural language processing and computer vision. As an example, it was recently shown that it is possible to infer a bilingual lexicon, without supervised data, by aligning word embeddings trained on monolingual data. These recent advances are based on adversarial training to learn the mapping between the two embeddings. In this paper, we propose to use an alternative formulation, based on the joint estimation of an orthogonal matrix and a permutation matrix. While this problem is not convex, we propose to initialize our optimization algorithm by using a convex relaxation, traditionally considered for the graph isomorphism problem. We propose a stochastic algorithm to minimize our cost function on large scale problems. Finally, we evaluate our method on the problem of unsupervised word translation, by aligning word embeddings trained on monolingual data. On this task, our method obtains state of the art results, while requiring less computational resources than competing approaches.

Via

Access Paper or Ask Questions

Learning Word Vectors for 157 Languages

Mar 28, 2018

Edouard Grave, Piotr Bojanowski, Prakhar Gupta, Armand Joulin, Tomas Mikolov

Figure 1 for Learning Word Vectors for 157 Languages

Figure 2 for Learning Word Vectors for 157 Languages

Figure 3 for Learning Word Vectors for 157 Languages

Figure 4 for Learning Word Vectors for 157 Languages

Abstract:Distributed word representations, or word vectors, have recently been applied to many tasks in natural language processing, leading to state-of-the-art performance. A key ingredient to the successful application of these representations is to train them on very large corpora, and use these pre-trained models in downstream tasks. In this paper, we describe how we trained such high quality word representations for 157 languages. We used two sources of data to train these models: the free online encyclopedia Wikipedia and data from the common crawl project. We also introduce three new word analogy datasets to evaluate these word vectors, for French, Hindi and Polish. Finally, we evaluate our pre-trained word vectors on 10 languages for which evaluation datasets exists, showing very strong performance compared to previous models.

* Accepted to LREC

Via

Access Paper or Ask Questions

Advances in Pre-Training Distributed Word Representations

Dec 26, 2017

Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, Armand Joulin

Figure 1 for Advances in Pre-Training Distributed Word Representations

Figure 2 for Advances in Pre-Training Distributed Word Representations

Figure 3 for Advances in Pre-Training Distributed Word Representations

Figure 4 for Advances in Pre-Training Distributed Word Representations

Abstract:Many Natural Language Processing applications nowadays rely on pre-trained word representations estimated from large text corpora such as news collections, Wikipedia and Web Crawl. In this paper, we show how to train high-quality word vector representations by using a combination of known tricks that are however rarely used together. The main result of our work is the new set of publicly available pre-trained models that outperform the current state of the art by a large margin on a number of tasks.

Via

Access Paper or Ask Questions

Unbounded cache model for online language modeling with open vocabulary

Nov 07, 2017

Edouard Grave, Moustapha Cisse, Armand Joulin

Figure 1 for Unbounded cache model for online language modeling with open vocabulary

Figure 2 for Unbounded cache model for online language modeling with open vocabulary

Figure 3 for Unbounded cache model for online language modeling with open vocabulary

Figure 4 for Unbounded cache model for online language modeling with open vocabulary

Abstract:Recently, continuous cache models were proposed as extensions to recurrent neural network language models, to adapt their predictions to local changes in the data distribution. These models only capture the local context, of up to a few thousands tokens. In this paper, we propose an extension of continuous cache models, which can scale to larger contexts. In particular, we use a large scale non-parametric memory component that stores all the hidden activations seen in the past. We leverage recent advances in approximate nearest neighbor search and quantization algorithms to store millions of representations while searching them efficiently. We conduct extensive experiments showing that our approach significantly improves the perplexity of pre-trained language models on new distributions, and can scale efficiently to much larger contexts than previously proposed local cache models.

* Accepted to NIPS 2017

Via

Access Paper or Ask Questions

Fast Linear Model for Knowledge Graph Embeddings

Oct 30, 2017

Armand Joulin, Edouard Grave, Piotr Bojanowski, Maximilian Nickel, Tomas Mikolov

Figure 1 for Fast Linear Model for Knowledge Graph Embeddings

Figure 2 for Fast Linear Model for Knowledge Graph Embeddings

Figure 3 for Fast Linear Model for Knowledge Graph Embeddings

Figure 4 for Fast Linear Model for Knowledge Graph Embeddings

Abstract:This paper shows that a simple baseline based on a Bag-of-Words (BoW) representation learns surprisingly good knowledge graph embeddings. By casting knowledge base completion and question answering as supervised classification problems, we observe that modeling co-occurences of entities and relations leads to state-of-the-art performance with a training time of a few minutes using the open sourced library fastText.

* Submitted AKBC 2017

Via

Access Paper or Ask Questions