Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

David Grangier

Scaling Neural Machine Translation

Sep 04, 2018

Myle Ott, Sergey Edunov, David Grangier, Michael Auli

Figure 1 for Scaling Neural Machine Translation

Figure 2 for Scaling Neural Machine Translation

Figure 3 for Scaling Neural Machine Translation

Figure 4 for Scaling Neural Machine Translation

Abstract:Sequence to sequence learning models still require several days to reach state of the art performance on large benchmark datasets using a single machine. This paper shows that reduced precision and large batch training can speedup training by nearly 5x on a single 8-GPU machine with careful tuning and implementation. On WMT'14 English-German translation, we match the accuracy of Vaswani et al. (2017) in under 5 hours when training on 8 GPUs and we obtain a new state of the art of 29.3 BLEU after training for 85 minutes on 128 GPUs. We further improve these results to 29.8 BLEU by training on the much larger Paracrawl dataset. On the WMT'14 English-French task, we obtain a state-of-the-art BLEU of 43.2 in 8.5 hours on 128 GPUs.

* WMT 2018

Via

Access Paper or Ask Questions

Analyzing Uncertainty in Neural Machine Translation

Aug 13, 2018

Myle Ott, Michael Auli, David Grangier, Marc'Aurelio Ranzato

Figure 1 for Analyzing Uncertainty in Neural Machine Translation

Figure 2 for Analyzing Uncertainty in Neural Machine Translation

Figure 3 for Analyzing Uncertainty in Neural Machine Translation

Figure 4 for Analyzing Uncertainty in Neural Machine Translation

Abstract:Machine translation is a popular test bed for research in neural sequence-to-sequence models but despite much recent research, there is still a lack of understanding of these models. Practitioners report performance degradation with large beams, the under-estimation of rare words and a lack of diversity in the final translations. Our study relates some of these issues to the inherent uncertainty of the task, due to the existence of multiple valid translations for a single source sentence, and to the extrinsic uncertainty caused by noisy training data. We propose tools and metrics to assess how uncertainty in the data is captured by the model distribution and how it affects search strategies that generate translations. Our results show that search works remarkably well but that models tend to spread too much probability mass over the hypothesis space. Next, we propose tools to assess model calibration and show how to easily fix some shortcomings of current models. As part of this study, we release multiple human reference translations for two popular benchmarks.

* ICML 2018

Via

Access Paper or Ask Questions

QuaterNet: A Quaternion-based Recurrent Model for Human Motion

Jul 31, 2018

Dario Pavllo, David Grangier, Michael Auli

Figure 1 for QuaterNet: A Quaternion-based Recurrent Model for Human Motion

Figure 2 for QuaterNet: A Quaternion-based Recurrent Model for Human Motion

Figure 3 for QuaterNet: A Quaternion-based Recurrent Model for Human Motion

Figure 4 for QuaterNet: A Quaternion-based Recurrent Model for Human Motion

Abstract:Deep learning for predicting or generating 3D human pose sequences is an active research area. Previous work regresses either joint rotations or joint positions. The former strategy is prone to error accumulation along the kinematic chain, as well as discontinuities when using Euler angle or exponential map parameterizations. The latter requires re-projection onto skeleton constraints to avoid bone stretching and invalid configurations. This work addresses both limitations. Our recurrent network, QuaterNet, represents rotations with quaternions and our loss function performs forward kinematics on a skeleton to penalize absolute position errors instead of angle errors. On short-term predictions, QuaterNet improves the state-of-the-art quantitatively. For long-term generation, our approach is qualitatively judged as realistic as recent neural strategies from the graphics literature.

* British Machine Vision Conference (BMVC), 2018

Via

Access Paper or Ask Questions

Controllable Abstractive Summarization

May 18, 2018

Angela Fan, David Grangier, Michael Auli

Figure 1 for Controllable Abstractive Summarization

Figure 2 for Controllable Abstractive Summarization

Figure 3 for Controllable Abstractive Summarization

Figure 4 for Controllable Abstractive Summarization

Abstract:Current models for document summarization disregard user preferences such as the desired length, style, the entities that the user might be interested in, or how much of the document the user has already read. We present a neural summarization model with a simple but effective mechanism to enable users to specify these high level attributes in order to control the shape of the final summaries to better suit their needs. With user input, our system can produce high quality summaries that follow user preferences. Without user input, we set the control variables automatically. On the full text CNN-Dailymail dataset, we outperform state of the art abstractive systems (both in terms of F1-ROUGE1 40.38 vs. 39.53 and human evaluation).

* ACL2018 Workshop on Neural Machine Translation and Generation (NMT@ACL)

Via

Access Paper or Ask Questions

Iterative Refinement for Machine Translation

Apr 13, 2018

Roman Novak, Michael Auli, David Grangier

Figure 1 for Iterative Refinement for Machine Translation

Figure 2 for Iterative Refinement for Machine Translation

Figure 3 for Iterative Refinement for Machine Translation

Figure 4 for Iterative Refinement for Machine Translation

Abstract:Existing machine translation decoding algorithms generate translations in a strictly monotonic fashion and never revisit previous decisions. As a result, earlier mistakes cannot be corrected at a later stage. In this paper, we present a translation scheme that starts from an initial guess and then makes iterative improvements that may revisit previous decisions. We parameterize our model as a convolutional neural network that predicts discrete substitutions to an existing translation based on an attention mechanism over both the source sentence as well as the current translation output. By making less than one modification per sentence, we improve the output of a phrase-based translation system by up to 0.4 BLEU on WMT15 German-English translation.

* Presented as a poster at BayLearn 2017

Via

Access Paper or Ask Questions

QuickEdit: Editing Text & Translations by Crossing Words Out

Mar 28, 2018

David Grangier, Michael Auli

Figure 1 for QuickEdit: Editing Text & Translations by Crossing Words Out

Figure 2 for QuickEdit: Editing Text & Translations by Crossing Words Out

Figure 3 for QuickEdit: Editing Text & Translations by Crossing Words Out

Figure 4 for QuickEdit: Editing Text & Translations by Crossing Words Out

Abstract:We propose a framework for computer-assisted text editing. It applies to translation post-editing and to paraphrasing. Our proposal relies on very simple interactions: a human editor modifies a sentence by marking tokens they would like the system to change. Our model then generates a new sentence which reformulates the initial sentence by avoiding marked words. The approach builds upon neural sequence-to-sequence modeling and introduces a neural network which takes as input a sentence along with change markers. Our model is trained on translation bitext by simulating post-edits. We demonstrate the advantage of our approach for translation post-editing through simulated post-edits. We also evaluate our model for paraphrasing through a user study.

* NAACL'18

Via

Access Paper or Ask Questions

Language Modeling with Gated Convolutional Networks

Sep 08, 2017

Yann N. Dauphin, Angela Fan, Michael Auli, David Grangier

Figure 1 for Language Modeling with Gated Convolutional Networks

Figure 2 for Language Modeling with Gated Convolutional Networks

Figure 3 for Language Modeling with Gated Convolutional Networks

Figure 4 for Language Modeling with Gated Convolutional Networks

Abstract:The pre-dominant approach to language modeling to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we develop a finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens. We propose a novel simplified gating mechanism that outperforms Oord et al (2016) and investigate the impact of key architectural decisions. The proposed approach achieves state-of-the-art on the WikiText-103 benchmark, even though it features long-term dependencies, as well as competitive results on the Google Billion Words benchmark. Our model reduces the latency to score a sentence by an order of magnitude compared to a recurrent baseline. To our knowledge, this is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks.

Via

Access Paper or Ask Questions

Convolutional Sequence to Sequence Learning

Jul 25, 2017

Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin

Figure 1 for Convolutional Sequence to Sequence Learning

Figure 2 for Convolutional Sequence to Sequence Learning

Figure 3 for Convolutional Sequence to Sequence Learning

Figure 4 for Convolutional Sequence to Sequence Learning

Abstract:The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. Compared to recurrent models, computations over all elements can be fully parallelized during training and optimization is easier since the number of non-linearities is fixed and independent of the input length. Our use of gated linear units eases gradient propagation and we equip each decoder layer with a separate attention module. We outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT'14 English-German and WMT'14 English-French translation at an order of magnitude faster speed, both on GPU and CPU.

Via

Access Paper or Ask Questions

A Convolutional Encoder Model for Neural Machine Translation

Jul 25, 2017

Jonas Gehring, Michael Auli, David Grangier, Yann N. Dauphin

Figure 1 for A Convolutional Encoder Model for Neural Machine Translation

Figure 2 for A Convolutional Encoder Model for Neural Machine Translation

Figure 3 for A Convolutional Encoder Model for Neural Machine Translation

Figure 4 for A Convolutional Encoder Model for Neural Machine Translation

Abstract:The prevalent approach to neural machine translation relies on bi-directional LSTMs to encode the source sentence. In this paper we present a faster and simpler architecture based on a succession of convolutional layers. This allows to encode the entire source sentence simultaneously compared to recurrent networks for which computation is constrained by temporal dependencies. On WMT'16 English-Romanian translation we achieve competitive accuracy to the state-of-the-art and we outperform several recently published results on the WMT'15 English-German task. Our models obtain almost the same accuracy as a very deep LSTM setup on WMT'14 English-French translation. Our convolutional encoder speeds up CPU decoding by more than two times at the same or higher accuracy as a strong bi-directional LSTM baseline.

* 13 pages

Via

Access Paper or Ask Questions

Efficient softmax approximation for GPUs

Jun 19, 2017

Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, Hervé Jégou

Figure 1 for Efficient softmax approximation for GPUs

Figure 2 for Efficient softmax approximation for GPUs

Figure 3 for Efficient softmax approximation for GPUs

Figure 4 for Efficient softmax approximation for GPUs

Abstract:We propose an approximate strategy to efficiently train neural network based language models over very large vocabularies. Our approach, called adaptive softmax, circumvents the linear dependency on the vocabulary size by exploiting the unbalanced word distribution to form clusters that explicitly minimize the expectation of computation time. Our approach further reduces the computational time by exploiting the specificities of modern architectures and matrix-matrix vector operations, making it particularly suited for graphical processing units. Our experiments carried out on standard benchmarks, such as EuroParl and One Billion Word, show that our approach brings a large gain in efficiency over standard approximations while achieving an accuracy close to that of the full softmax. The code of our method is available at https://github.com/facebookresearch/adaptive-softmax.

* Accepted to ICML 2017

Via

Access Paper or Ask Questions