Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Guillaume Lample

DOBF: A Deobfuscation Pre-Training Objective for Programming Languages

Feb 16, 2021

Baptiste Roziere, Marie-Anne Lachaux, Marc Szafraniec, Guillaume Lample

Figure 1 for DOBF: A Deobfuscation Pre-Training Objective for Programming Languages

Figure 2 for DOBF: A Deobfuscation Pre-Training Objective for Programming Languages

Figure 3 for DOBF: A Deobfuscation Pre-Training Objective for Programming Languages

Figure 4 for DOBF: A Deobfuscation Pre-Training Objective for Programming Languages

Abstract:Recent advances in self-supervised learning have dramatically improved the state of the art on a wide variety of tasks. However, research in language model pre-training has mostly focused on natural languages, and it is unclear whether models like BERT and its variants provide the best pre-training when applied to other modalities, such as source code. In this paper, we introduce a new pre-training objective, DOBF, that leverages the structural aspect of programming languages and pre-trains a model to recover the original version of obfuscated source code. We show that models pre-trained with DOBF significantly outperform existing approaches on multiple downstream tasks, providing relative improvements of up to 13% in unsupervised code translation, and 24% in natural language code search. Incidentally, we found that our pre-trained model is able to de-obfuscate fully obfuscated source files, and to suggest descriptive variable names.

Via

Access Paper or Ask Questions

Target Conditioning for One-to-Many Generation

Sep 21, 2020

Marie-Anne Lachaux, Armand Joulin, Guillaume Lample

Figure 1 for Target Conditioning for One-to-Many Generation

Figure 2 for Target Conditioning for One-to-Many Generation

Figure 3 for Target Conditioning for One-to-Many Generation

Figure 4 for Target Conditioning for One-to-Many Generation

Abstract:Neural Machine Translation (NMT) models often lack diversity in their generated translations, even when paired with search algorithm, like beam search. A challenge is that the diversity in translations are caused by the variability in the target language, and cannot be inferred from the source sentence alone. In this paper, we propose to explicitly model this one-to-many mapping by conditioning the decoder of a NMT model on a latent variable that represents the domain of target sentences. The domain is a discrete variable generated by a target encoder that is jointly trained with the NMT model. The predicted domain of target sentences are given as input to the decoder during training. At inference, we can generate diverse translations by decoding with different domains. Unlike our strongest baseline (Shen et al., 2019), our method can scale to any number of domains without affecting the performance or the training time. We assess the quality and diversity of translations generated by our model with several metrics, on three different datasets.

Via

Access Paper or Ask Questions

Deep Differential System Stability -- Learning advanced computations from examples

Jun 11, 2020

François Charton, Amaury Hayat, Guillaume Lample

Figure 1 for Deep Differential System Stability -- Learning advanced computations from examples

Figure 2 for Deep Differential System Stability -- Learning advanced computations from examples

Figure 3 for Deep Differential System Stability -- Learning advanced computations from examples

Figure 4 for Deep Differential System Stability -- Learning advanced computations from examples

Abstract:Can advanced mathematical computations be learned from examples? Using transformers over large generated datasets, we train models to learn properties of differential systems, such as local stability, behavior at infinity and controllability. We achieve near perfect estimates of qualitative characteristics of the systems, and good approximations of numerical quantities, demonstrating that neural networks can learn advanced theorems and complex computations without built-in mathematical knowledge.

Via

Access Paper or Ask Questions

Unsupervised Translation of Programming Languages

Jun 05, 2020

Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, Guillaume Lample

Figure 1 for Unsupervised Translation of Programming Languages

Figure 2 for Unsupervised Translation of Programming Languages

Figure 3 for Unsupervised Translation of Programming Languages

Figure 4 for Unsupervised Translation of Programming Languages

Abstract:A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a modern one. They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is timeconsuming and requires expertise in both the source and target languages, making code-translation projects expensive. Although neural models significantly outperform their rule-based counterparts in the context of natural language translation, their applications to transcompilation have been limited due to the scarcity of parallel data in this domain. In this paper, we propose to leverage recent approaches in unsupervised machine translation to train a fully unsupervised neural transcompiler. We train our model on source code from open source GitHub projects, and show that it can translate functions between C++, Java, and Python with high accuracy. Our method relies exclusively on monolingual source code, requires no expertise in the source or target languages, and can easily be generalized to other programming languages. We also build and release a test set composed of 852 parallel functions, along with unit tests to check the correctness of translations. We show that our model outperforms rule-based commercial baselines by a significant margin.

Via

Access Paper or Ask Questions

Deep Learning for Symbolic Mathematics

Dec 02, 2019

Guillaume Lample, François Charton

Figure 1 for Deep Learning for Symbolic Mathematics

Figure 2 for Deep Learning for Symbolic Mathematics

Figure 3 for Deep Learning for Symbolic Mathematics

Figure 4 for Deep Learning for Symbolic Mathematics

Abstract:Neural networks have a reputation for being better at solving statistical or approximate problems than at performing calculations or working with symbolic data. In this paper, we show that they can be surprisingly good at more elaborated tasks in mathematics, such as symbolic integration and solving differential equations. We propose a syntax for representing mathematical problems, and methods for generating large datasets that can be used to train sequence-to-sequence models. We achieve results that outperform commercial Computer Algebra Systems such as Matlab or Mathematica.

Via

Access Paper or Ask Questions

Large Memory Layers with Product Keys

Jul 10, 2019

Guillaume Lample, Alexandre Sablayrolles, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou

Figure 1 for Large Memory Layers with Product Keys

Figure 2 for Large Memory Layers with Product Keys

Figure 3 for Large Memory Layers with Product Keys

Figure 4 for Large Memory Layers with Product Keys

Abstract:This paper introduces a structured memory which can be easily integrated into a neural network. The memory is very large by design and therefore significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on product keys, which enable fast and exact nearest neighbor search. The ability to increase the number of parameters while keeping the same computational budget lets the overall system strike a better trade-off between prediction accuracy and computation efficiency both at training and test time. This memory layer allows us to tackle very large scale language modeling tasks. In our experiments we consider a dataset with up to 30 billion words, and we plug our memory layer in a state-of-the-art transformer-based architecture. In particular, we found that a memory augmented model with only 12 layers outperforms a baseline transformer model with 24 layers, while being twice faster at inference time. We release our code for reproducibility purposes.

Via

Access Paper or Ask Questions

Augmenting Self-attention with Persistent Memory

Jul 02, 2019

Sainbayar Sukhbaatar, Edouard Grave, Guillaume Lample, Herve Jegou, Armand Joulin

Figure 1 for Augmenting Self-attention with Persistent Memory

Figure 2 for Augmenting Self-attention with Persistent Memory

Figure 3 for Augmenting Self-attention with Persistent Memory

Figure 4 for Augmenting Self-attention with Persistent Memory

Abstract:Transformer networks have lead to important progress in language modeling and machine translation. These models include two consecutive modules, a feed-forward layer and a self-attention layer. The latter allows the network to capture long term dependencies and are often regarded as the key ingredient in the success of Transformers. Building upon this intuition, we propose a new model that solely consists of attention layers. More precisely, we augment the self-attention layers with persistent memory vectors that play a similar role as the feed-forward layer. Thanks to these vectors, we can remove the feed-forward layer without degrading the performance of a transformer. Our evaluation shows the benefits brought by our model on standard character and word level language modeling benchmarks.

Via

Access Paper or Ask Questions

Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English

Feb 04, 2019

Francisco Guzmán, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, Marc'Aurelio Ranzato

Figure 1 for Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English

Figure 2 for Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English

Figure 3 for Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English

Figure 4 for Two New Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English

Abstract:The vast majority of language pairs in the world are low-resource because they have little, if any, parallel data available. Unfortunately, machine translation (MT) systems do not currently work well in this setting. Besides the technical challenges of learning with limited supervision, there is also another challenge: it is very difficult to evaluate methods trained on low resource language pairs because there are very few freely and publicly available benchmarks. In this work, we take sentences from Wikipedia pages and introduce new evaluation datasets in two very low resource language pairs, Nepali-English and Sinhala-English. These are languages with very different morphology and syntax, for which little out-of-domain parallel data is available and for which relatively large amounts of monolingual data are freely available. We describe our process to collect and cross-check the quality of translations, and we report baseline performance using several learning settings: fully supervised, weakly supervised, semi-supervised, and fully unsupervised. Our experiments demonstrate that current state-of-the-art methods perform rather poorly on this benchmark, posing a challenge to the research community working on low resource MT. Data and code to reproduce our experiments are available at https://github.com/facebookresearch/flores.

Via

Access Paper or Ask Questions

Cross-lingual Language Model Pretraining

Jan 22, 2019

Guillaume Lample, Alexis Conneau

Figure 1 for Cross-lingual Language Model Pretraining

Figure 2 for Cross-lingual Language Model Pretraining

Figure 3 for Cross-lingual Language Model Pretraining

Figure 4 for Cross-lingual Language Model Pretraining

Abstract:Recent studies have demonstrated the efficiency of generative pretraining for English natural language understanding. In this work, we extend this approach to multiple languages and show the effectiveness of cross-lingual pretraining. We propose two methods to learn cross-lingual language models (XLMs): one unsupervised that only relies on monolingual data, and one supervised that leverages parallel data with a new cross-lingual language model objective. We obtain state-of-the-art results on cross-lingual classification, unsupervised and supervised machine translation. On XNLI, our approach pushes the state of the art by an absolute gain of 4.9% accuracy. On unsupervised machine translation, we obtain 34.3 BLEU on WMT'16 German-English, improving the previous state of the art by more than 9 BLEU. On supervised machine translation, we obtain a new state of the art of 38.5 BLEU on WMT'16 Romanian-English, outperforming the previous best approach by more than 4 BLEU. Our code and pretrained models will be made publicly available.

Via

Access Paper or Ask Questions

Multiple-Attribute Text Style Transfer

Nov 01, 2018

Sandeep Subramanian, Guillaume Lample, Eric Michael Smith, Ludovic Denoyer, Marc'Aurelio Ranzato, Y-Lan Boureau

Figure 1 for Multiple-Attribute Text Style Transfer

Figure 2 for Multiple-Attribute Text Style Transfer

Figure 3 for Multiple-Attribute Text Style Transfer

Figure 4 for Multiple-Attribute Text Style Transfer

Abstract:The dominant approach to unsupervised "style transfer" in text is based on the idea of learning a latent representation, which is independent of the attributes specifying its "style". In this paper, we show that this condition is not necessary and is not always met in practice, even with domain adversarial training that explicitly aims at learning such disentangled representations. We thus propose a new model that controls several factors of variation in textual data where this condition on disentanglement is replaced with a simpler mechanism based on back-translation. Our method allows control over multiple attributes, like gender, sentiment, product type, etc., and a more fine-grained control on the trade-off between content preservation and change of style with a pooling operator in the latent space. Our experiments demonstrate that the fully entangled model produces better generations, even when tested on new and more challenging benchmarks comprising reviews with multiple sentences and multiple attributes.

Via

Access Paper or Ask Questions