Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michael Auli

Pay Less Attention with Lightweight and Dynamic Convolutions

Jan 29, 2019

Felix Wu, Angela Fan, Alexei Baevski, Yann N. Dauphin, Michael Auli

Figure 1 for Pay Less Attention with Lightweight and Dynamic Convolutions

Figure 2 for Pay Less Attention with Lightweight and Dynamic Convolutions

Figure 3 for Pay Less Attention with Lightweight and Dynamic Convolutions

Figure 4 for Pay Less Attention with Lightweight and Dynamic Convolutions

Abstract:Self-attention is a useful mechanism to build generative models for language and images. It determines the importance of context elements by comparing each element to the current time step. In this paper, we show that a very lightweight convolution can perform competitively to the best reported self-attention results. Next, we introduce dynamic convolutions which are simpler and more efficient than self-attention. We predict separate convolution kernels based solely on the current time-step in order to determine the importance of context elements. The number of operations required by this approach scales linearly in the input length, whereas self-attention is quadratic. Experiments on large-scale machine translation, language modeling and abstractive summarization show that dynamic convolutions improve over strong self-attention models. On the WMT'14 English-German test set dynamic convolutions achieve a new state of the art of 29.7 BLEU.

* 14 pages, ICLR oral

Via

Access Paper or Ask Questions

Modeling Human Motion with Quaternion-based Neural Networks

Jan 21, 2019

Dario Pavllo, Christoph Feichtenhofer, Michael Auli, David Grangier

Figure 1 for Modeling Human Motion with Quaternion-based Neural Networks

Figure 2 for Modeling Human Motion with Quaternion-based Neural Networks

Figure 3 for Modeling Human Motion with Quaternion-based Neural Networks

Figure 4 for Modeling Human Motion with Quaternion-based Neural Networks

Abstract:Previous work on predicting or generating 3D human pose sequences regresses either joint rotations or joint positions. The former strategy is prone to error accumulation along the kinematic chain, as well as discontinuities when using Euler angles or exponential maps as parameterizations. The latter requires re-projection onto skeleton constraints to avoid bone stretching and invalid configurations. This work addresses both limitations. QuaterNet represents rotations with quaternions and our loss function performs forward kinematics on a skeleton to penalize absolute position errors instead of angle errors. We investigate both recurrent and convolutional architectures and evaluate on short-term prediction and long-term generation. For the latter, our approach is qualitatively judged as realistic as recent neural strategies from the graphics literature. Our experiments compare quaternions to Euler angles as well as exponential maps and show that only a very short context is required to make reliable future predictions. Finally, we show that the standard evaluation protocol for Human3.6M produces high variance results and we propose a simple solution.

* arXiv admin note: substantial text overlap with arXiv:1805.06485

Via

Access Paper or Ask Questions

3D human pose estimation in video with temporal convolutions and semi-supervised training

Nov 28, 2018

Dario Pavllo, Christoph Feichtenhofer, David Grangier, Michael Auli

Figure 1 for 3D human pose estimation in video with temporal convolutions and semi-supervised training

Figure 2 for 3D human pose estimation in video with temporal convolutions and semi-supervised training

Figure 3 for 3D human pose estimation in video with temporal convolutions and semi-supervised training

Figure 4 for 3D human pose estimation in video with temporal convolutions and semi-supervised training

Abstract:In this work, we demonstrate that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints. We also introduce back-projection, a simple and effective semi-supervised training method that leverages unlabeled video data. We start with predicted 2D keypoints for unlabeled video, then estimate 3D poses and finally back-project to the input 2D keypoints. In the supervised setting, our fully-convolutional model outperforms the previous best result from the literature by 6 mm mean per-joint position error on Human3.6M, corresponding to an error reduction of 11%, and the model also shows significant improvements on HumanEva-I. Moreover, experiments with back-projection show that it comfortably outperforms previous state-of-the-art results in semi-supervised settings where labeled data is scarce. Code and models are available at https://github.com/facebookresearch/VideoPose3D

* 13 pages

Via

Access Paper or Ask Questions

Wizard of Wikipedia: Knowledge-Powered Conversational agents

Nov 03, 2018

Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, Jason Weston

Figure 1 for Wizard of Wikipedia: Knowledge-Powered Conversational agents

Figure 2 for Wizard of Wikipedia: Knowledge-Powered Conversational agents

Figure 3 for Wizard of Wikipedia: Knowledge-Powered Conversational agents

Figure 4 for Wizard of Wikipedia: Knowledge-Powered Conversational agents

Abstract:In open-domain dialogue intelligent agents should exhibit the use of knowledge, however there are few convincing demonstrations of this to date. The most popular sequence to sequence models typically "generate and hope" generic utterances that can be memorized in the weights of the model when mapping from input utterance(s) to output, rather than employing recalled knowledge as context. Use of knowledge has so far proved difficult, in part because of the lack of a supervised learning benchmark task which exhibits knowledgeable open dialogue with clear grounding. To that end we collect and release a large dataset with conversations directly grounded with knowledge retrieved from Wikipedia. We then design architectures capable of retrieving knowledge, reading and conditioning on it, and finally generating natural responses. Our best performing dialogue models are able to conduct knowledgeable discussions on open-domain topics as evaluated by automatic metrics and human evaluations, while our new benchmark allows for measuring further improvements in this important research direction.

Via

Access Paper or Ask Questions

Classical Structured Prediction Losses for Sequence to Sequence Learning

Oct 05, 2018

Sergey Edunov, Myle Ott, Michael Auli, David Grangier, Marc'Aurelio Ranzato

Figure 1 for Classical Structured Prediction Losses for Sequence to Sequence Learning

Figure 2 for Classical Structured Prediction Losses for Sequence to Sequence Learning

Figure 3 for Classical Structured Prediction Losses for Sequence to Sequence Learning

Figure 4 for Classical Structured Prediction Losses for Sequence to Sequence Learning

Abstract:There has been much recent work on training neural attention models at the sequence-level using either reinforcement learning-style methods or by optimizing the beam. In this paper, we survey a range of classical objective functions that have been widely used to train linear models for structured prediction and apply them to neural sequence to sequence models. Our experiments show that these losses can perform surprisingly well by slightly outperforming beam search optimization in a like for like setup. We also report new state of the art results on both IWSLT'14 German-English translation as well as Gigaword abstractive summarization. On the larger WMT'14 English-French translation task, sequence-level training achieves 41.5 BLEU which is on par with the state of the art.

* 10 pages, NAACL 2018

Via

Access Paper or Ask Questions

Understanding Back-Translation at Scale

Oct 03, 2018

Sergey Edunov, Myle Ott, Michael Auli, David Grangier

Figure 1 for Understanding Back-Translation at Scale

Figure 2 for Understanding Back-Translation at Scale

Figure 3 for Understanding Back-Translation at Scale

Figure 4 for Understanding Back-Translation at Scale

Abstract:An effective method to improve neural machine translation with monolingual data is to augment the parallel training corpus with back-translations of target language sentences. This work broadens the understanding of back-translation and investigates a number of methods to generate synthetic source sentences. We find that in all but resource poor settings back-translations obtained via sampling or noised beam outputs are most effective. Our analysis shows that sampling or noisy synthetic data gives a much stronger training signal than data generated by beam or greedy search. We also compare how synthetic data compares to genuine bitext and study various domain effects. Finally, we scale to hundreds of millions of monolingual sentences and achieve a new state of the art of 35 BLEU on the WMT'14 English-German test set.

* 12 pages; EMNLP 2018

Via

Access Paper or Ask Questions

Adaptive Input Representations for Neural Language Modeling

Oct 01, 2018

Alexei Baevski, Michael Auli

Figure 1 for Adaptive Input Representations for Neural Language Modeling

Figure 2 for Adaptive Input Representations for Neural Language Modeling

Figure 3 for Adaptive Input Representations for Neural Language Modeling

Figure 4 for Adaptive Input Representations for Neural Language Modeling

Abstract:We introduce adaptive input representations for neural language modeling which extend the adaptive softmax of Grave et al. (2017) to input representations of variable capacity. There are several choices on how to factorize the input and output layers, and whether to model words, characters or sub-word units. We perform a systematic comparison of popular choices for a self-attentional architecture. Our experiments show that models equipped with adaptive embeddings are more than twice as fast to train than the popular character input CNN while having a lower number of parameters. We achieve a new state of the art on the WikiText-103 benchmark of 20.51 perplexity, improving the next best known result by 8.7 perplexity. On the Billion word benchmark, we achieve a state of the art of 24.14 perplexity.

* 12 pages

Via

Access Paper or Ask Questions

Scaling Neural Machine Translation

Sep 04, 2018

Myle Ott, Sergey Edunov, David Grangier, Michael Auli

Figure 1 for Scaling Neural Machine Translation

Figure 2 for Scaling Neural Machine Translation

Figure 3 for Scaling Neural Machine Translation

Figure 4 for Scaling Neural Machine Translation

Abstract:Sequence to sequence learning models still require several days to reach state of the art performance on large benchmark datasets using a single machine. This paper shows that reduced precision and large batch training can speedup training by nearly 5x on a single 8-GPU machine with careful tuning and implementation. On WMT'14 English-German translation, we match the accuracy of Vaswani et al. (2017) in under 5 hours when training on 8 GPUs and we obtain a new state of the art of 29.3 BLEU after training for 85 minutes on 128 GPUs. We further improve these results to 29.8 BLEU by training on the much larger Paracrawl dataset. On the WMT'14 English-French task, we obtain a state-of-the-art BLEU of 43.2 in 8.5 hours on 128 GPUs.

* WMT 2018

Via

Access Paper or Ask Questions

Analyzing Uncertainty in Neural Machine Translation

Aug 13, 2018

Myle Ott, Michael Auli, David Grangier, Marc'Aurelio Ranzato

Figure 1 for Analyzing Uncertainty in Neural Machine Translation

Figure 2 for Analyzing Uncertainty in Neural Machine Translation

Figure 3 for Analyzing Uncertainty in Neural Machine Translation

Figure 4 for Analyzing Uncertainty in Neural Machine Translation

Abstract:Machine translation is a popular test bed for research in neural sequence-to-sequence models but despite much recent research, there is still a lack of understanding of these models. Practitioners report performance degradation with large beams, the under-estimation of rare words and a lack of diversity in the final translations. Our study relates some of these issues to the inherent uncertainty of the task, due to the existence of multiple valid translations for a single source sentence, and to the extrinsic uncertainty caused by noisy training data. We propose tools and metrics to assess how uncertainty in the data is captured by the model distribution and how it affects search strategies that generate translations. Our results show that search works remarkably well but that models tend to spread too much probability mass over the hypothesis space. Next, we propose tools to assess model calibration and show how to easily fix some shortcomings of current models. As part of this study, we release multiple human reference translations for two popular benchmarks.

* ICML 2018

Via

Access Paper or Ask Questions

QuaterNet: A Quaternion-based Recurrent Model for Human Motion

Jul 31, 2018

Dario Pavllo, David Grangier, Michael Auli

Figure 1 for QuaterNet: A Quaternion-based Recurrent Model for Human Motion

Figure 2 for QuaterNet: A Quaternion-based Recurrent Model for Human Motion

Figure 3 for QuaterNet: A Quaternion-based Recurrent Model for Human Motion

Figure 4 for QuaterNet: A Quaternion-based Recurrent Model for Human Motion

Abstract:Deep learning for predicting or generating 3D human pose sequences is an active research area. Previous work regresses either joint rotations or joint positions. The former strategy is prone to error accumulation along the kinematic chain, as well as discontinuities when using Euler angle or exponential map parameterizations. The latter requires re-projection onto skeleton constraints to avoid bone stretching and invalid configurations. This work addresses both limitations. Our recurrent network, QuaterNet, represents rotations with quaternions and our loss function performs forward kinematics on a skeleton to penalize absolute position errors instead of angle errors. On short-term predictions, QuaterNet improves the state-of-the-art quantitatively. For long-term generation, our approach is qualitatively judged as realistic as recent neural strategies from the graphics literature.

* British Machine Vision Conference (BMVC), 2018

Via

Access Paper or Ask Questions