Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Guillaume Lample

Augmenting Self-attention with Persistent Memory

Jul 02, 2019

Sainbayar Sukhbaatar, Edouard Grave, Guillaume Lample, Herve Jegou, Armand Joulin

Figure 1 for Augmenting Self-attention with Persistent Memory

Figure 2 for Augmenting Self-attention with Persistent Memory

Figure 3 for Augmenting Self-attention with Persistent Memory

Figure 4 for Augmenting Self-attention with Persistent Memory

Abstract:Transformer networks have lead to important progress in language modeling and machine translation. These models include two consecutive modules, a feed-forward layer and a self-attention layer. The latter allows the network to capture long term dependencies and are often regarded as the key ingredient in the success of Transformers. Building upon this intuition, we propose a new model that solely consists of attention layers. More precisely, we augment the self-attention layers with persistent memory vectors that play a similar role as the feed-forward layer. Thanks to these vectors, we can remove the feed-forward layer without degrading the performance of a transformer. Our evaluation shows the benefits brought by our model on standard character and word level language modeling benchmarks.

Via

Access Paper or Ask Questions

Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English

Feb 04, 2019

Francisco Guzmán, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, Marc'Aurelio Ranzato

Figure 1 for Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English

Figure 2 for Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English

Figure 3 for Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English

Figure 4 for Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English

Abstract:The vast majority of language pairs in the world are low-resource because they have little, if any, parallel data available. Unfortunately, machine translation (MT) systems do not currently work well in this setting. Besides the technical challenges of learning with limited supervision, there is also another challenge: it is very difficult to evaluate methods trained on low resource language pairs because there are very few freely and publicly available benchmarks. In this work, we take sentences from Wikipedia pages and introduce new evaluation datasets in two very low resource language pairs, Nepali-English and Sinhala-English. These are languages with very different morphology and syntax, for which little out-of-domain parallel data is available and for which relatively large amounts of monolingual data are freely available. We describe our process to collect and cross-check the quality of translations, and we report baseline performance using several learning settings: fully supervised, weakly supervised, semi-supervised, and fully unsupervised. Our experiments demonstrate that current state-of-the-art methods perform rather poorly on this benchmark, posing a challenge to the research community working on low resource MT. Data and code to reproduce our experiments are available at https://github.com/facebookresearch/flores.

Via

Access Paper or Ask Questions

Cross-lingual Language Model Pretraining

Jan 22, 2019

Guillaume Lample, Alexis Conneau

Figure 1 for Cross-lingual Language Model Pretraining

Figure 2 for Cross-lingual Language Model Pretraining

Figure 3 for Cross-lingual Language Model Pretraining

Figure 4 for Cross-lingual Language Model Pretraining

Abstract:Recent studies have demonstrated the efficiency of generative pretraining for English natural language understanding. In this work, we extend this approach to multiple languages and show the effectiveness of cross-lingual pretraining. We propose two methods to learn cross-lingual language models (XLMs): one unsupervised that only relies on monolingual data, and one supervised that leverages parallel data with a new cross-lingual language model objective. We obtain state-of-the-art results on cross-lingual classification, unsupervised and supervised machine translation. On XNLI, our approach pushes the state of the art by an absolute gain of 4.9% accuracy. On unsupervised machine translation, we obtain 34.3 BLEU on WMT'16 German-English, improving the previous state of the art by more than 9 BLEU. On supervised machine translation, we obtain a new state of the art of 38.5 BLEU on WMT'16 Romanian-English, outperforming the previous best approach by more than 4 BLEU. Our code and pretrained models will be made publicly available.

Via

Access Paper or Ask Questions

Multiple-Attribute Text Style Transfer

Nov 01, 2018

Sandeep Subramanian, Guillaume Lample, Eric Michael Smith, Ludovic Denoyer, Marc'Aurelio Ranzato, Y-Lan Boureau

Figure 1 for Multiple-Attribute Text Style Transfer

Figure 2 for Multiple-Attribute Text Style Transfer

Figure 3 for Multiple-Attribute Text Style Transfer

Figure 4 for Multiple-Attribute Text Style Transfer

Abstract:The dominant approach to unsupervised "style transfer" in text is based on the idea of learning a latent representation, which is independent of the attributes specifying its "style". In this paper, we show that this condition is not necessary and is not always met in practice, even with domain adversarial training that explicitly aims at learning such disentangled representations. We thus propose a new model that controls several factors of variation in textual data where this condition on disentanglement is replaced with a simpler mechanism based on back-translation. Our method allows control over multiple attributes, like gender, sentiment, product type, etc., and a more fine-grained control on the trade-off between content preservation and change of style with a pooling operator in the latent space. Our experiments demonstrate that the fully entangled model produces better generations, even when tested on new and more challenging benchmarks comprising reviews with multiple sentences and multiple attributes.

Via

Access Paper or Ask Questions

XNLI: Evaluating Cross-lingual Sentence Representations

Sep 13, 2018

Alexis Conneau, Guillaume Lample, Ruty Rinott, Adina Williams, Samuel R. Bowman, Holger Schwenk, Veselin Stoyanov

Figure 1 for XNLI: Evaluating Cross-lingual Sentence Representations

Figure 2 for XNLI: Evaluating Cross-lingual Sentence Representations

Figure 3 for XNLI: Evaluating Cross-lingual Sentence Representations

Figure 4 for XNLI: Evaluating Cross-lingual Sentence Representations

Abstract:State-of-the-art natural language processing systems rely on supervision in the form of annotated data to learn competent models. These models are generally trained on data in a single language (usually English), and cannot be directly used beyond that language. Since collecting data in every language is not realistic, there has been a growing interest in cross-lingual language understanding (XLU) and low-resource cross-language transfer. In this work, we construct an evaluation set for XLU by extending the development and test sets of the Multi-Genre Natural Language Inference Corpus (MultiNLI) to 15 languages, including low-resource languages such as Swahili and Urdu. We hope that our dataset, dubbed XNLI, will catalyze research in cross-lingual sentence understanding by providing an informative standard evaluation task. In addition, we provide several baselines for multilingual sentence understanding, including two based on machine translation systems, and two that use parallel data to train aligned multilingual bag-of-words and LSTM encoders. We find that XNLI represents a practical and challenging evaluation suite, and that directly translating the test data yields the best performance among available baselines.

* EMNLP 2018

Via

Access Paper or Ask Questions

Phrase-Based & Neural Unsupervised Machine Translation

Aug 13, 2018

Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, Marc'Aurelio Ranzato

Figure 1 for Phrase-Based & Neural Unsupervised Machine Translation

Figure 2 for Phrase-Based & Neural Unsupervised Machine Translation

Figure 3 for Phrase-Based & Neural Unsupervised Machine Translation

Figure 4 for Phrase-Based & Neural Unsupervised Machine Translation

Abstract:Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of language pairs. This work investigates how to learn to translate when having access to only large monolingual corpora in each language. We propose two model variants, a neural and a phrase-based model. Both versions leverage a careful initialization of the parameters, the denoising effect of language models and automatic generation of parallel data by iterative back-translation. These models are significantly better than methods from the literature, while being simpler and having fewer hyper-parameters. On the widely used WMT'14 English-French and WMT'16 German-English benchmarks, our models respectively obtain 28.1 and 25.2 BLEU points without using a single parallel sentence, outperforming the state of the art by more than 11 BLEU points. On low-resource languages like English-Urdu and English-Romanian, our methods achieve even better results than semi-supervised and supervised approaches leveraging the paucity of available bitexts. Our code for NMT and PBSMT is publicly available.

* EMNLP 2018

Via

Access Paper or Ask Questions

What you can cram into a single vector: Probing sentence embeddings for linguistic properties

Jul 08, 2018

Alexis Conneau, German Kruszewski, Guillaume Lample, Loïc Barrault, Marco Baroni

Figure 1 for What you can cram into a single vector: Probing sentence embeddings for linguistic properties

Figure 2 for What you can cram into a single vector: Probing sentence embeddings for linguistic properties

Figure 3 for What you can cram into a single vector: Probing sentence embeddings for linguistic properties

Figure 4 for What you can cram into a single vector: Probing sentence embeddings for linguistic properties

Abstract:Although much effort has recently been devoted to training high-quality sentence embeddings, we still have a poor understanding of what they are capturing. "Downstream" tasks, often based on sentence classification, are commonly used to evaluate the quality of sentence representations. The complexity of the tasks makes it however difficult to infer what kind of information is present in the representations. We introduce here 10 probing tasks designed to capture simple linguistic features of sentences, and we use them to study embeddings generated by three different encoders trained in eight distinct ways, uncovering intriguing properties of both encoders and training methods.

* ACL 2018

Via

Access Paper or Ask Questions

Unsupervised Machine Translation Using Monolingual Corpora Only

Apr 13, 2018

Guillaume Lample, Alexis Conneau, Ludovic Denoyer, Marc'Aurelio Ranzato

Figure 1 for Unsupervised Machine Translation Using Monolingual Corpora Only

Figure 2 for Unsupervised Machine Translation Using Monolingual Corpora Only

Figure 3 for Unsupervised Machine Translation Using Monolingual Corpora Only

Figure 4 for Unsupervised Machine Translation Using Monolingual Corpora Only

Abstract:Machine translation has recently achieved impressive performance thanks to recent advances in deep learning and the availability of large-scale parallel corpora. There have been numerous attempts to extend these successes to low-resource language pairs, yet requiring tens of thousands of parallel sentences. In this work, we take this research direction to the extreme and investigate whether it is possible to learn to translate even without any parallel data. We propose a model that takes sentences from monolingual corpora in two different languages and maps them into the same latent space. By learning to reconstruct in both languages from this shared feature space, the model effectively learns to translate without using any labeled data. We demonstrate our model on two widely used datasets and two language pairs, reporting BLEU scores of 32.8 and 15.1 on the Multi30k and WMT English-French datasets, without using even a single parallel sentence at training time.

* ICLR 2018

Via

Access Paper or Ask Questions

Word Translation Without Parallel Data

Jan 30, 2018

Alexis Conneau, Guillaume Lample, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou

Figure 1 for Word Translation Without Parallel Data

Figure 2 for Word Translation Without Parallel Data

Figure 3 for Word Translation Without Parallel Data

Figure 4 for Word Translation Without Parallel Data

Abstract:State-of-the-art methods for learning cross-lingual word embeddings have relied on bilingual dictionaries or parallel corpora. Recent studies showed that the need for parallel data supervision can be alleviated with character-level information. While these methods showed encouraging results, they are not on par with their supervised counterparts and are limited to pairs of languages sharing a common alphabet. In this work, we show that we can build a bilingual dictionary between two languages without using any parallel corpora, by aligning monolingual word embedding spaces in an unsupervised way. Without using any character information, our model even outperforms existing supervised methods on cross-lingual tasks for some language pairs. Our experiments demonstrate that our method works very well also for distant language pairs, like English-Russian or English-Chinese. We finally describe experiments on the English-Esperanto low-resource language pair, on which there only exists a limited amount of parallel data, to show the potential impact of our method in fully unsupervised machine translation. Our code, embeddings and dictionaries are publicly available.

* ICLR 2018

Via

Access Paper or Ask Questions

Playing FPS Games with Deep Reinforcement Learning

Jan 29, 2018

Guillaume Lample, Devendra Singh Chaplot

Figure 1 for Playing FPS Games with Deep Reinforcement Learning

Figure 2 for Playing FPS Games with Deep Reinforcement Learning

Figure 3 for Playing FPS Games with Deep Reinforcement Learning

Figure 4 for Playing FPS Games with Deep Reinforcement Learning

Abstract:Advances in deep reinforcement learning have allowed autonomous agents to perform well on Atari games, often outperforming humans, using only raw pixels to make their decisions. However, most of these games take place in 2D environments that are fully observable to the agent. In this paper, we present the first architecture to tackle 3D environments in first-person shooter games, that involve partially observable states. Typically, deep reinforcement learning methods only utilize visual input for training. We present a method to augment these models to exploit game feature information such as the presence of enemies or items, during the training phase. Our model is trained to simultaneously learn these features along with minimizing a Q-learning objective, which is shown to dramatically improve the training speed and performance of our agent. Our architecture is also modularized to allow different models to be independently trained for different phases of the game. We show that the proposed architecture substantially outperforms built-in AI agents of the game as well as humans in deathmatch scenarios.

* The authors contributed equally to this work

Via

Access Paper or Ask Questions