Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ann Lee

Few-shot Sequence Learning with Transformers

Dec 17, 2020
Lajanugen Logeswaran, Ann Lee, Myle Ott, Honglak Lee, Marc'Aurelio Ranzato, Arthur Szlam

Figure 1 for Few-shot Sequence Learning with Transformers

Figure 2 for Few-shot Sequence Learning with Transformers

Figure 3 for Few-shot Sequence Learning with Transformers

Figure 4 for Few-shot Sequence Learning with Transformers

Few-shot algorithms aim at learning new tasks provided only a handful of training examples. In this work we investigate few-shot learning in the setting where the data points are sequences of tokens and propose an efficient learning algorithm based on Transformers. In the simplest setting, we append a token to an input sequence which represents the particular task to be undertaken, and show that the embedding of this token can be optimized on the fly given few labeled examples. Our approach does not require complicated changes to the model architecture such as adapter layers nor computing second order derivatives as is currently popular in the meta-learning and few-shot learning literature. We demonstrate our approach on a variety of tasks, and analyze the generalization properties of several model variants and baseline approaches. In particular, we show that compositional task descriptors can improve performance. Experiments show that our approach works at least as well as other methods, while being more computationally efficient.

* NeurIPS Meta-Learning Workshop 2020

Via

Access Paper or Ask Questions

Facebook AI's WMT20 News Translation Task Submission

Nov 16, 2020
Peng-Jen Chen, Ann Lee, Changhan Wang, Naman Goyal, Angela Fan, Mary Williamson, Jiatao Gu

Figure 1 for Facebook AI's WMT20 News Translation Task Submission

Figure 2 for Facebook AI's WMT20 News Translation Task Submission

Figure 3 for Facebook AI's WMT20 News Translation Task Submission

Figure 4 for Facebook AI's WMT20 News Translation Task Submission

This paper describes Facebook AI's submission to WMT20 shared news translation task. We focus on the low resource setting and participate in two language pairs, Tamil <-> English and Inuktitut <-> English, where there are limited out-of-domain bitext and monolingual data. We approach the low resource problem using two main strategies, leveraging all available data and adapting the system to the target news domain. We explore techniques that leverage bitext and monolingual data from all languages, such as self-supervised model pretraining, multilingual models, data augmentation, and reranking. To better adapt the translation system to the test domain, we explore dataset tagging and fine-tuning on in-domain data. We observe that different techniques provide varied improvements based on the available data of the language pair. Based on the finding, we integrate these techniques into one training pipeline. For En->Ta, we explore an unconstrained setup with additional Tamil bitext and monolingual data and show that further improvement can be obtained. On the test set, our best submitted systems achieve 21.5 and 13.7 BLEU for Ta->En and En->Ta respectively, and 27.9 and 13.0 for Iu->En and En->Iu respectively.

Via

Access Paper or Ask Questions

Semi-Supervised Speech Recognition via Local Prior Matching

Feb 24, 2020
Wei-Ning Hsu, Ann Lee, Gabriel Synnaeve, Awni Hannun

Figure 1 for Semi-Supervised Speech Recognition via Local Prior Matching

Figure 2 for Semi-Supervised Speech Recognition via Local Prior Matching

Figure 3 for Semi-Supervised Speech Recognition via Local Prior Matching

Figure 4 for Semi-Supervised Speech Recognition via Local Prior Matching

For sequence transduction tasks like speech recognition, a strong structured prior model encodes rich information about the target space, implicitly ruling out invalid sequences by assigning them low probability. In this work, we propose local prior matching (LPM), a semi-supervised objective that distills knowledge from a strong prior (e.g. a language model) to provide learning signal to a discriminative model trained on unlabeled speech. We demonstrate that LPM is theoretically well-motivated, simple to implement, and superior to existing knowledge distillation techniques under comparable settings. Starting from a baseline trained on 100 hours of labeled speech, with an additional 360 hours of unlabeled data, LPM recovers 54% and 73% of the word error rate on clean and noisy test sets relative to a fully supervised model on the same data.

Via

Access Paper or Ask Questions

Self-Training for End-to-End Speech Recognition

Sep 19, 2019
Jacob Kahn, Ann Lee, Awni Hannun

Figure 1 for Self-Training for End-to-End Speech Recognition

Figure 2 for Self-Training for End-to-End Speech Recognition

Figure 3 for Self-Training for End-to-End Speech Recognition

Figure 4 for Self-Training for End-to-End Speech Recognition

We revisit self-training in the context of end-to-end speech recognition. We demonstrate that training with pseudo-labels can substantially improve the accuracy of a baseline model by leveraging unlabelled data. Key to our approach are a strong baseline acoustic and language model used to generate the pseudo-labels, a robust and stable beam-search decoder, and a novel ensemble approach used to increase pseudo-label diversity. Experiments on the LibriSpeech corpus show that self-training with a single model can yield a 21% relative WER improvement on clean data over a baseline trained on 100 hours of labelled data. We also evaluate label filtering approaches to increase pseudo-label quality. With an ensemble of six models in conjunction with label filtering, self-training yields a 26% relative improvement and bridges 55.6% of the gap between the baseline and an oracle model trained with all of the labels.

Via

Access Paper or Ask Questions

Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions

Apr 04, 2019
Awni Hannun, Ann Lee, Qiantong Xu, Ronan Collobert

Figure 1 for Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions

Figure 2 for Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions

Figure 3 for Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions

Figure 4 for Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions

We propose a fully convolutional sequence-to-sequence encoder architecture with a simple and efficient decoder. Our model improves WER on LibriSpeech while being an order of magnitude more efficient than a strong RNN baseline. Key to our approach is a time-depth separable convolution block which dramatically reduces the number of parameters in the model while keeping the receptive field large. We also give a stable and efficient beam search inference procedure which allows us to effectively integrate a language model. Coupled with a convolutional language model, our time-depth separable convolution architecture improves by more than 22% relative WER over the best previously reported sequence-to-sequence results on the noisy LibriSpeech test set.

Via

Access Paper or Ask Questions