Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Noah A. Smith

Contextual Word Representations: A Contextual Introduction

Feb 19, 2019
Noah A. Smith

This introduction aims to tell the story of how we put words into computers. It is part of the story of the field of natural language processing (NLP), a branch of artificial intelligence. It targets a wide audience with a basic understanding of computer programming, but avoids a detailed mathematical treatment, and it does not present any algorithms. It also does not focus on any particular application of NLP such as translation, question answering, or information extraction. The ideas presented here were developed by many researchers over many decades, so the citations are not exhaustive but rather direct the reader to a handful of papers that are, in the author's view, seminal. After reading this document, you should have a general understanding of word vectors (also known as word embeddings): why they exist, what problems they solve, where they come from, how they have changed over time, and what some of the open questions about them are. Readers already familiar with word vectors are advised to skip to Section 5 for the discussion of the most recent advance, contextual word vectors.

Via

Access Paper or Ask Questions

Deep Weighted Averaging Classifiers

Nov 18, 2018
Dallas Card, Michael Zhang, Noah A. Smith

Figure 1 for Deep Weighted Averaging Classifiers

Figure 2 for Deep Weighted Averaging Classifiers

Figure 3 for Deep Weighted Averaging Classifiers

Figure 4 for Deep Weighted Averaging Classifiers

Recent advances in deep learning have achieved impressive gains in classification accuracy on a variety of types of data, including images and text. Despite these gains, however, concerns have been raised about the calibration, robustness, and interpretability of these models. In this paper we propose a simple way to modify any conventional deep architecture to automatically provide more transparent explanations for classification decisions, as well as an intuitive notion of the credibility of each prediction. Specifically, we draw on ideas from nonparametric kernel regression, and propose to predict labels based on a weighted sum of training instances, where the weights are determined by distance in a learned instance-embedding space. Working within the framework of conformal methods, we propose a new measure of nonconformity suggested by our model, and experimentally validate the accompanying theoretical expectations, demonstrating improved transparency, controlled error rates, and robustness to out-of-domain data, without compromising on accuracy or calibration.

* 13 pages, 8 figures, 5 tables, added DOI and updated to meet ACM formatting requirements, In Proceedings of FAT* (2019)

Via

Access Paper or Ask Questions

ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning

Oct 31, 2018
Maarten Sap, Ronan LeBras, Emily Allaway, Chandra Bhagavatula, Nicholas Lourie, Hannah Rashkin, Brendan Roof, Noah A. Smith, Yejin Choi

Figure 1 for ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning

Figure 2 for ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning

Figure 3 for ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning

Figure 4 for ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning

We present ATOMIC, an atlas of everyday commonsense reasoning, organized through 300k textual descriptions. Compared to existing resources that center around taxonomic knowledge, ATOMIC focuses on inferential knowledge organized as typed if-then relations with variables (e.g., "if X pays Y a compliment, then Y will likely return the compliment"). We propose nine if-then relation types to distinguish causes v.s. effects, agents v.s. themes, voluntary v.s. involuntary events, and actions v.s. mental states. By generatively training on the rich inferential knowledge described in ATOMIC, we show that neural models can acquire simple commonsense capabilities and reason about previously unseen events. Experimental results demonstrate that multitask models that incorporate the hierarchical structure of if-then relation types lead to more accurate inference compared to models trained in isolation, as measured by both automatic and human evaluation.

* Accepted to AAAI 2019; 9 pages, 3 figures

Via

Access Paper or Ask Questions

You May Not Need Attention

Oct 31, 2018
Ofir Press, Noah A. Smith

In NMT, how far can we get without attention and without separate encoding and decoding? To answer that question, we introduce a recurrent neural translation model that does not use attention and does not have a separate encoder and decoder. Our eager translation model is low-latency, writing target tokens as soon as it reads the first source token, and uses constant memory during decoding. It performs on par with the standard attention-based model of Bahdanau et al. (2014), and better on long sentences.

Via

Access Paper or Ask Questions

Neural Models for Documents with Metadata

Oct 23, 2018
Dallas Card, Chenhao Tan, Noah A. Smith

Figure 1 for Neural Models for Documents with Metadata

Figure 2 for Neural Models for Documents with Metadata

Figure 3 for Neural Models for Documents with Metadata

Figure 4 for Neural Models for Documents with Metadata

Most real-world document collections involve various types of metadata, such as author, source, and date, and yet the most commonly-used approaches to modeling text corpora ignore this information. While specialized models have been developed for particular applications, few are widely used in practice, as customization typically requires derivation of a custom inference algorithm. In this paper, we build on recent advances in variational inference methods and propose a general neural framework, based on topic models, to enable flexible incorporation of metadata and allow for rapid exploration of alternative models. Our approach achieves strong performance, with a manageable tradeoff between perplexity, coherence, and sparsity. Finally, we demonstrate the potential of our framework through an exploration of a corpus of articles about US immigration.

* Dallas Card, Chenhao Tan, and Noah A. Smith. (2018). Neural Models for Documents with Metadata. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
* 13 pages, 3 figures, 6 tables; updating to version published at ACL 2018

Via

Access Paper or Ask Questions

Neural Cross-Lingual Named Entity Recognition with Minimal Resources

Sep 11, 2018
Jiateng Xie, Zhilin Yang, Graham Neubig, Noah A. Smith, Jaime Carbonell

Figure 1 for Neural Cross-Lingual Named Entity Recognition with Minimal Resources

Figure 2 for Neural Cross-Lingual Named Entity Recognition with Minimal Resources

Figure 3 for Neural Cross-Lingual Named Entity Recognition with Minimal Resources

Figure 4 for Neural Cross-Lingual Named Entity Recognition with Minimal Resources

For languages with no annotated resources, unsupervised transfer of natural language processing models such as named-entity recognition (NER) from resource-rich languages would be an appealing capability. However, differences in words and word order across languages make it a challenging problem. To improve mapping of lexical items across languages, we propose a method that finds translations based on bilingual word embeddings. To improve robustness to word order differences, we propose to use self-attention, which allows for a degree of flexibility with respect to word order. We demonstrate that these methods achieve state-of-the-art or competitive NER performance on commonly tested languages under a cross-lingual setting, with much lower resource requirements than past approaches. We also evaluate the challenges of applying these methods to Uyghur, a low-resource language.

* EMNLP 2018 long paper

Via

Access Paper or Ask Questions

Syntactic Scaffolds for Semantic Structures

Aug 30, 2018
Swabha Swayamdipta, Sam Thomson, Kenton Lee, Luke Zettlemoyer, Chris Dyer, Noah A. Smith

Figure 1 for Syntactic Scaffolds for Semantic Structures

Figure 2 for Syntactic Scaffolds for Semantic Structures

Figure 3 for Syntactic Scaffolds for Semantic Structures

Figure 4 for Syntactic Scaffolds for Semantic Structures

We introduce the syntactic scaffold, an approach to incorporating syntactic information into semantic tasks. Syntactic scaffolds avoid expensive syntactic processing at runtime, only making use of a treebank during training, through a multitask objective. We improve over strong baselines on PropBank semantics, frame semantics, and coreference resolution, achieving competitive performance on all three tasks.

* Accepted at EMNLP 2018

Via

Access Paper or Ask Questions

Semantic Matching Against a Corpus: New Applications and Methods

Aug 28, 2018
Lucy H. Lin, Scott Miles, Noah A. Smith

Figure 1 for Semantic Matching Against a Corpus: New Applications and Methods

Figure 2 for Semantic Matching Against a Corpus: New Applications and Methods

Figure 3 for Semantic Matching Against a Corpus: New Applications and Methods

Figure 4 for Semantic Matching Against a Corpus: New Applications and Methods

We consider the case of a domain expert who wishes to explore the extent to which a particular idea is expressed in a text collection. We propose the task of semantically matching the idea, expressed as a natural language proposition, against a corpus. We create two preliminary tasks derived from existing datasets, and then introduce a more realistic one on disaster recovery designed for emergency managers, whom we engaged in a user study. On the latter, we find that a new model built from natural language entailment data produces higher-quality matches than simple word-vector averaging, both on expert-crafted queries and on ones produced by the subjects themselves. This work provides a proof-of-concept for such applications of semantic matching and illustrates key challenges.

* 18 pages, 5 figures

Via

Access Paper or Ask Questions

Rational Recurrences

Aug 28, 2018
Hao Peng, Roy Schwartz, Sam Thomson, Noah A. Smith

Despite the tremendous empirical success of neural models in natural language processing, many of them lack the strong intuitions that accompany classical machine learning approaches. Recently, connections have been shown between convolutional neural networks (CNNs) and weighted finite state automata (WFSAs), leading to new interpretations and insights. In this work, we show that some recurrent neural networks also share this connection to WFSAs. We characterize this connection formally, defining rational recurrences to be recurrent hidden state update functions that can be written as the Forward calculation of a finite set of WFSAs. We show that several recent neural models use rational recurrences. Our analysis provides a fresh view of these models and facilitates devising new neural architectures that draw inspiration from WFSAs. We present one such model, which performs better than two recent baselines on language modeling and text classification. Our results demonstrate that transferring intuitions from classical models like WFSAs can be an effective approach to designing and understanding neural models.

* EMNLP 2018

Via

Access Paper or Ask Questions

Greedy, Joint Syntactic-Semantic Parsing with Stack LSTMs

Jul 05, 2018
Swabha Swayamdipta, Miguel Ballesteros, Chris Dyer, Noah A. Smith

Figure 1 for Greedy, Joint Syntactic-Semantic Parsing with Stack LSTMs

Figure 2 for Greedy, Joint Syntactic-Semantic Parsing with Stack LSTMs

Figure 3 for Greedy, Joint Syntactic-Semantic Parsing with Stack LSTMs

Figure 4 for Greedy, Joint Syntactic-Semantic Parsing with Stack LSTMs

We present a transition-based parser that jointly produces syntactic and semantic dependencies. It learns a representation of the entire algorithm state, using stack long short-term memories. Our greedy inference algorithm has linear time, including feature extraction. On the CoNLL 2008--9 English shared tasks, we obtain the best published parsing performance among models that jointly learn syntax and semantics.

* Proceedings of CoNLL 2016; 13 pages, 5 figures

Via

Access Paper or Ask Questions