Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Angela Fan

Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Inputs

Oct 18, 2019
Angela Fan, Claire Gardent, Chloe Braud, Antoine Bordes

Figure 1 for Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Inputs

Figure 2 for Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Inputs

Figure 3 for Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Inputs

Figure 4 for Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Inputs

Query-based open-domain NLP tasks require information synthesis from long and diverse web results. Current approaches extractively select portions of web text as input to Sequence-to-Sequence models using methods such as TF-IDF ranking. We propose constructing a local graph structured knowledge base for each query, which compresses the web search information and reduces redundancy. We show that by linearizing the graph into a structured input sequence, models can encode the graph representations within a standard Sequence-to-Sequence setting. For two generative tasks with very long text input, long-form question answering and multi-document summarization, feeding graph representations as input can achieve better performance than using retrieved text portions.

Via

Access Paper or Ask Questions

Reducing Transformer Depth on Demand with Structured Dropout

Sep 25, 2019
Angela Fan, Edouard Grave, Armand Joulin

Figure 1 for Reducing Transformer Depth on Demand with Structured Dropout

Figure 2 for Reducing Transformer Depth on Demand with Structured Dropout

Figure 3 for Reducing Transformer Depth on Demand with Structured Dropout

Figure 4 for Reducing Transformer Depth on Demand with Structured Dropout

Overparameterized transformer networks have obtained state of the art results in various natural language processing tasks, such as machine translation, language modeling, and question answering. These models contain hundreds of millions of parameters, necessitating a large amount of computation and making them prone to overfitting. In this work, we explore LayerDrop, a form of structured dropout, which has a regularization effect during training and allows for efficient pruning at inference time. In particular, we show that it is possible to select sub-networks of any depth from one large network without having to finetune them and with limited impact on performance. We demonstrate the effectiveness of our approach by improving the state of the art on machine translation, language modeling, summarization, question answering, and language understanding benchmarks. Moreover, we show that our approach leads to small BERT-like models of higher quality compared to training from scratch or using distillation.

Via

Access Paper or Ask Questions

ELI5: Long Form Question Answering

Jul 22, 2019
Angela Fan, Yacine Jernite, Ethan Perez, David Grangier, Jason Weston, Michael Auli

Figure 1 for ELI5: Long Form Question Answering

Figure 2 for ELI5: Long Form Question Answering

Figure 3 for ELI5: Long Form Question Answering

Figure 4 for ELI5: Long Form Question Answering

We introduce the first large-scale corpus for long-form question answering, a task requiring elaborate and in-depth answers to open-ended questions. The dataset comprises 270K threads from the Reddit forum ``Explain Like I'm Five'' (ELI5) where an online community provides answers to questions which are comprehensible by five year olds. Compared to existing datasets, ELI5 comprises diverse questions requiring multi-sentence answers. We provide a large set of web documents to help answer the question. Automatic and human evaluations show that an abstractive model trained with a multi-task objective outperforms conventional Seq2Seq, language modeling, as well as a strong extractive baseline. However, our best model is still far from human performance since raters prefer gold responses in over 86% of cases, leaving ample opportunity for future improvement.

Via

Access Paper or Ask Questions

GLOSS: Generative Latent Optimization of Sentence Representations

Jul 15, 2019
Sidak Pal Singh, Angela Fan, Michael Auli

Figure 1 for GLOSS: Generative Latent Optimization of Sentence Representations

Figure 2 for GLOSS: Generative Latent Optimization of Sentence Representations

Figure 3 for GLOSS: Generative Latent Optimization of Sentence Representations

Figure 4 for GLOSS: Generative Latent Optimization of Sentence Representations

We propose a method to learn unsupervised sentence representations in a non-compositional manner based on Generative Latent Optimization. Our approach does not impose any assumptions on how words are to be combined into a sentence representation. We discuss a simple Bag of Words model as well as a variant that models word positions. Both are trained to reconstruct the sentence based on a latent code and our model can be used to generate text. Experiments show large improvements over the related Paragraph Vectors. Compared to uSIF, we achieve a relative improvement of 5% when trained on the same data and our method performs competitively to Sent2vec while trained on 30 times less data.

Via

Access Paper or Ask Questions

fairseq: A Fast, Extensible Toolkit for Sequence Modeling

Apr 01, 2019
Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, Michael Auli

Figure 1 for fairseq: A Fast, Extensible Toolkit for Sequence Modeling

Figure 2 for fairseq: A Fast, Extensible Toolkit for Sequence Modeling

Figure 3 for fairseq: A Fast, Extensible Toolkit for Sequence Modeling

Figure 4 for fairseq: A Fast, Extensible Toolkit for Sequence Modeling

fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks. The toolkit is based on PyTorch and supports distributed training across multiple GPUs and machines. We also support fast mixed-precision training and inference on modern GPUs. A demo video can be found at https://www.youtube.com/watch?v=OtgDdWtHvto

* NAACL 2019 Demo paper

Via

Access Paper or Ask Questions

Learning to Speak and Act in a Fantasy Text Adventure Game

Mar 07, 2019
Jack Urbanek, Angela Fan, Siddharth Karamcheti, Saachi Jain, Samuel Humeau, Emily Dinan, Tim Rocktäschel, Douwe Kiela, Arthur Szlam, Jason Weston

Figure 1 for Learning to Speak and Act in a Fantasy Text Adventure Game

Figure 2 for Learning to Speak and Act in a Fantasy Text Adventure Game

Figure 3 for Learning to Speak and Act in a Fantasy Text Adventure Game

Figure 4 for Learning to Speak and Act in a Fantasy Text Adventure Game

We introduce a large scale crowdsourced text adventure game as a research platform for studying grounded dialogue. In it, agents can perceive, emote, and act whilst conducting dialogue with other agents. Models and humans can both act as characters within the game. We describe the results of training state-of-the-art generative and retrieval models in this setting. We show that in addition to using past dialogue, these models are able to effectively use the state of the underlying world to condition their predictions. In particular, we show that grounding on the details of the local environment, including location descriptions, and the objects (and their affordances) and characters (and their previous actions) present within it allows better predictions of agent behavior and dialogue. We analyze the ingredients necessary for successful grounding in this setting, and how each of these factors relate to agents that can talk and act successfully.

Via

Access Paper or Ask Questions

Strategies for Structuring Story Generation

Feb 04, 2019
Angela Fan, Mike Lewis, Yann Dauphin

Figure 1 for Strategies for Structuring Story Generation

Figure 2 for Strategies for Structuring Story Generation

Figure 3 for Strategies for Structuring Story Generation

Figure 4 for Strategies for Structuring Story Generation

Writers generally rely on plans or sketches to write long stories, but most current language models generate word by word from left to right. We explore coarse-to-fine models for creating narrative texts of several hundred words, and introduce new models which decompose stories by abstracting over actions and entities. The model first generates the predicate-argument structure of the text, where different mentions of the same entity are marked with placeholder tokens. It then generates a surface realization of the predicate-argument structure, and finally replaces the entity placeholders with context-sensitive names and references. Human judges prefer the stories from our models to a wide range of previous approaches to hierarchical text generation. Extensive analysis shows that our methods can help improve the diversity and coherence of events and entities in generated stories.

Via

Access Paper or Ask Questions

Pay Less Attention with Lightweight and Dynamic Convolutions

Jan 29, 2019
Felix Wu, Angela Fan, Alexei Baevski, Yann N. Dauphin, Michael Auli

Figure 1 for Pay Less Attention with Lightweight and Dynamic Convolutions

Figure 2 for Pay Less Attention with Lightweight and Dynamic Convolutions

Figure 3 for Pay Less Attention with Lightweight and Dynamic Convolutions

Figure 4 for Pay Less Attention with Lightweight and Dynamic Convolutions

Self-attention is a useful mechanism to build generative models for language and images. It determines the importance of context elements by comparing each element to the current time step. In this paper, we show that a very lightweight convolution can perform competitively to the best reported self-attention results. Next, we introduce dynamic convolutions which are simpler and more efficient than self-attention. We predict separate convolution kernels based solely on the current time-step in order to determine the importance of context elements. The number of operations required by this approach scales linearly in the input length, whereas self-attention is quadratic. Experiments on large-scale machine translation, language modeling and abstractive summarization show that dynamic convolutions improve over strong self-attention models. On the WMT'14 English-German test set dynamic convolutions achieve a new state of the art of 29.7 BLEU.

* 14 pages, ICLR oral

Via

Access Paper or Ask Questions

Wizard of Wikipedia: Knowledge-Powered Conversational agents

Nov 03, 2018
Emily Dinan, Stephen Roller, Kurt Shuster, Angela Fan, Michael Auli, Jason Weston

Figure 1 for Wizard of Wikipedia: Knowledge-Powered Conversational agents

Figure 2 for Wizard of Wikipedia: Knowledge-Powered Conversational agents

Figure 3 for Wizard of Wikipedia: Knowledge-Powered Conversational agents

Figure 4 for Wizard of Wikipedia: Knowledge-Powered Conversational agents

In open-domain dialogue intelligent agents should exhibit the use of knowledge, however there are few convincing demonstrations of this to date. The most popular sequence to sequence models typically "generate and hope" generic utterances that can be memorized in the weights of the model when mapping from input utterance(s) to output, rather than employing recalled knowledge as context. Use of knowledge has so far proved difficult, in part because of the lack of a supervised learning benchmark task which exhibits knowledgeable open dialogue with clear grounding. To that end we collect and release a large dataset with conversations directly grounded with knowledge retrieved from Wikipedia. We then design architectures capable of retrieving knowledge, reading and conditioning on it, and finally generating natural responses. Our best performing dialogue models are able to conduct knowledgeable discussions on open-domain topics as evaluated by automatic metrics and human evaluations, while our new benchmark allows for measuring further improvements in this important research direction.

Via

Access Paper or Ask Questions

Controllable Abstractive Summarization

May 18, 2018
Angela Fan, David Grangier, Michael Auli

Figure 1 for Controllable Abstractive Summarization

Figure 2 for Controllable Abstractive Summarization

Figure 3 for Controllable Abstractive Summarization

Figure 4 for Controllable Abstractive Summarization

Current models for document summarization disregard user preferences such as the desired length, style, the entities that the user might be interested in, or how much of the document the user has already read. We present a neural summarization model with a simple but effective mechanism to enable users to specify these high level attributes in order to control the shape of the final summaries to better suit their needs. With user input, our system can produce high quality summaries that follow user preferences. Without user input, we set the control variables automatically. On the full text CNN-Dailymail dataset, we outperform state of the art abstractive systems (both in terms of F1-ROUGE1 40.38 vs. 39.53 and human evaluation).

* ACL2018 Workshop on Neural Machine Translation and Generation (NMT@ACL)

Via

Access Paper or Ask Questions