Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Christopher D. Manning

Shammie

Semi-Supervised Sequence Modeling with Cross-View Training

Sep 22, 2018

Kevin Clark, Minh-Thang Luong, Christopher D. Manning, Quoc V. Le

Figure 1 for Semi-Supervised Sequence Modeling with Cross-View Training

Figure 2 for Semi-Supervised Sequence Modeling with Cross-View Training

Figure 3 for Semi-Supervised Sequence Modeling with Cross-View Training

Figure 4 for Semi-Supervised Sequence Modeling with Cross-View Training

Abstract:Unsupervised representation learning algorithms such as word2vec and ELMo improve the accuracy of many supervised NLP models, mainly because they can take advantage of large amounts of unlabeled text. However, the supervised models only learn from task-specific labeled data during the main training phase. We therefore propose Cross-View Training (CVT), a semi-supervised learning algorithm that improves the representations of a Bi-LSTM sentence encoder using a mix of labeled and unlabeled data. On labeled examples, standard supervised learning is used. On unlabeled examples, CVT teaches auxiliary prediction modules that see restricted views of the input (e.g., only part of a sentence) to match the predictions of the full model seeing the whole input. Since the auxiliary modules and the full model share intermediate representations, this in turn improves the full model. Moreover, we show that CVT is particularly effective when combined with multi-task learning. We evaluate CVT on five sequence tagging tasks, machine translation, and dependency parsing, achieving state-of-the-art results.

* EMNLP 2018

Via

Access Paper or Ask Questions

Textual Analogy Parsing: What's Shared and What's Compared among Analogous Facts

Sep 07, 2018

Matthew Lamm, Arun Tejasvi Chaganty, Christopher D. Manning, Dan Jurafsky, Percy Liang

Figure 1 for Textual Analogy Parsing: What's Shared and What's Compared among Analogous Facts

Figure 2 for Textual Analogy Parsing: What's Shared and What's Compared among Analogous Facts

Figure 3 for Textual Analogy Parsing: What's Shared and What's Compared among Analogous Facts

Figure 4 for Textual Analogy Parsing: What's Shared and What's Compared among Analogous Facts

Abstract:To understand a sentence like "whereas only 10% of White Americans live at or below the poverty line, 28% of African Americans do" it is important not only to identify individual facts, e.g., poverty rates of distinct demographic groups, but also the higher-order relations between them, e.g., the disparity between them. In this paper, we propose the task of Textual Analogy Parsing (TAP) to model this higher-order meaning. The output of TAP is a frame-style meaning representation which explicitly specifies what is shared (e.g., poverty rates) and what is compared (e.g., White Americans vs. African Americans, 10% vs. 28%) between its component facts. Such a meaning representation can enable new applications that rely on discourse understanding such as automated chart generation from quantitative text. We present a new dataset for TAP, baselines, and a model that successfully uses an ILP to enforce the structural constraints of the problem.

* 12 pages including appendix and references. To be presented at EMNLP 2018

Via

Access Paper or Ask Questions

CoQA: A Conversational Question Answering Challenge

Aug 21, 2018

Siva Reddy, Danqi Chen, Christopher D. Manning

Abstract:Humans gather information by engaging in conversations involving a series of interconnected questions and answers. For machines to assist in information gathering, it is therefore essential to enable them to answer conversational questions. We introduce CoQA, a novel dataset for building Conversational Question Answering systems. Our dataset contains 127k questions with answers, obtained from 8k conversations about text passages from seven diverse domains. The questions are conversational, and the answers are free-form text with their corresponding evidence highlighted in the passage. We analyze CoQA in depth and show that conversational questions have challenging phenomena not present in existing reading comprehension datasets, e.g., coreference and pragmatic reasoning. We evaluate strong conversational and reading comprehension models on CoQA. The best system obtains an F1 score of 65.1%, which is 23.7 points behind human performance (88.8%), indicating there is ample room for improvement. We launch CoQA as a challenge to the community at http://stanfordnlp.github.io/coqa/

Via

Access Paper or Ask Questions

Simpler but More Accurate Semantic Dependency Parsing

Jul 03, 2018

Timothy Dozat, Christopher D. Manning

Figure 1 for Simpler but More Accurate Semantic Dependency Parsing

Figure 2 for Simpler but More Accurate Semantic Dependency Parsing

Figure 3 for Simpler but More Accurate Semantic Dependency Parsing

Figure 4 for Simpler but More Accurate Semantic Dependency Parsing

Abstract:While syntactic dependency annotations concentrate on the surface or functional structure of a sentence, semantic dependency annotations aim to capture between-word relationships that are more closely related to the meaning of a sentence, using graph-structured representations. We extend the LSTM-based syntactic parser of Dozat and Manning (2017) to train on and generate these graph structures. The resulting system on its own achieves state-of-the-art performance, beating the previous, substantially more complex state-of-the-art system by 0.6% labeled F1. Adding linguistically richer input representations pushes the margin even higher, allowing us to beat it by 1.9% labeled F1.

* ACL 2018 short paper

Via

Access Paper or Ask Questions

Compositional Attention Networks for Machine Reasoning

Apr 24, 2018

Drew A. Hudson, Christopher D. Manning

Figure 1 for Compositional Attention Networks for Machine Reasoning

Figure 2 for Compositional Attention Networks for Machine Reasoning

Figure 3 for Compositional Attention Networks for Machine Reasoning

Figure 4 for Compositional Attention Networks for Machine Reasoning

Abstract:We present the MAC network, a novel fully differentiable neural network architecture, designed to facilitate explicit and expressive reasoning. MAC moves away from monolithic black-box neural architectures towards a design that encourages both transparency and versatility. The model approaches problems by decomposing them into a series of attention-based reasoning steps, each performed by a novel recurrent Memory, Attention, and Composition (MAC) cell that maintains a separation between control and memory. By stringing the cells together and imposing structural constraints that regulate their interaction, MAC effectively learns to perform iterative reasoning processes that are directly inferred from the data in an end-to-end approach. We demonstrate the model's strength, robustness and interpretability on the challenging CLEVR dataset for visual reasoning, achieving a new state-of-the-art 98.9% accuracy, halving the error rate of the previous best model. More importantly, we show that the model is computationally-efficient and data-efficient, in particular requiring 5x less data than existing models to achieve strong results.

* Published as a conference paper at ICLR 2018

Via

Access Paper or Ask Questions

Sentences with Gapping: Parsing and Reconstructing Elided Predicates

Apr 18, 2018

Sebastian Schuster, Joakim Nivre, Christopher D. Manning

Figure 1 for Sentences with Gapping: Parsing and Reconstructing Elided Predicates

Figure 2 for Sentences with Gapping: Parsing and Reconstructing Elided Predicates

Figure 3 for Sentences with Gapping: Parsing and Reconstructing Elided Predicates

Figure 4 for Sentences with Gapping: Parsing and Reconstructing Elided Predicates

Abstract:Sentences with gapping, such as Paul likes coffee and Mary tea, lack an overt predicate to indicate the relation between two or more arguments. Surface syntax representations of such sentences are often produced poorly by parsers, and even if correct, not well suited to downstream natural language understanding tasks such as relation extraction that are typically designed to extract information from sentences with canonical clause structure. In this paper, we present two methods for parsing to a Universal Dependencies graph representation that explicitly encodes the elided material with additional nodes and edges. We find that both methods can reconstruct elided material from dependency trees with high accuracy when the parser correctly predicts the existence of a gap. We further demonstrate that one of our methods can be applied to other languages based on a case study on Swedish.

* Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2018)
* To be presented at NAACL 2018

Via

Access Paper or Ask Questions

A Copy-Augmented Sequence-to-Sequence Architecture Gives Good Performance on Task-Oriented Dialogue

Aug 14, 2017

Mihail Eric, Christopher D. Manning

Figure 1 for A Copy-Augmented Sequence-to-Sequence Architecture Gives Good Performance on Task-Oriented Dialogue

Figure 2 for A Copy-Augmented Sequence-to-Sequence Architecture Gives Good Performance on Task-Oriented Dialogue

Figure 3 for A Copy-Augmented Sequence-to-Sequence Architecture Gives Good Performance on Task-Oriented Dialogue

Figure 4 for A Copy-Augmented Sequence-to-Sequence Architecture Gives Good Performance on Task-Oriented Dialogue

Abstract:Task-oriented dialogue focuses on conversational agents that participate in user-initiated dialogues on domain-specific topics. In contrast to chatbots, which simply seek to sustain open-ended meaningful discourse, existing task-oriented agents usually explicitly model user intent and belief states. This paper examines bypassing such an explicit representation by depending on a latent neural embedding of state and learning selective attention to dialogue history together with copying to incorporate relevant prior context. We complement recent work by showing the effectiveness of simple sequence-to-sequence neural architectures with a copy mechanism. Our model outperforms more complex memory-augmented models by 7% in per-response generation and is on par with the current state-of-the-art on DSTC2.

* 6 pages

Via

Access Paper or Ask Questions

Key-Value Retrieval Networks for Task-Oriented Dialogue

Jul 14, 2017

Mihail Eric, Christopher D. Manning

Figure 1 for Key-Value Retrieval Networks for Task-Oriented Dialogue

Figure 2 for Key-Value Retrieval Networks for Task-Oriented Dialogue

Figure 3 for Key-Value Retrieval Networks for Task-Oriented Dialogue

Figure 4 for Key-Value Retrieval Networks for Task-Oriented Dialogue

Abstract:Neural task-oriented dialogue systems often struggle to smoothly interface with a knowledge base. In this work, we seek to address this problem by proposing a new neural dialogue agent that is able to effectively sustain grounded, multi-domain discourse through a novel key-value retrieval mechanism. The model is end-to-end differentiable and does not need to explicitly model dialogue state or belief trackers. We also release a new dataset of 3,031 dialogues that are grounded through underlying knowledge bases and span three distinct tasks in the in-car personal assistant space: calendar scheduling, weather information retrieval, and point-of-interest navigation. Our architecture is simultaneously trained on data from all domains and significantly outperforms a competitive rule-based system and other existing neural dialogue architectures on the provided domains according to both automatic and human evaluation metrics.

Via

Access Paper or Ask Questions

Arc-swift: A Novel Transition System for Dependency Parsing

May 12, 2017

Peng Qi, Christopher D. Manning

Figure 1 for Arc-swift: A Novel Transition System for Dependency Parsing

Figure 2 for Arc-swift: A Novel Transition System for Dependency Parsing

Figure 3 for Arc-swift: A Novel Transition System for Dependency Parsing

Figure 4 for Arc-swift: A Novel Transition System for Dependency Parsing

Abstract:Transition-based dependency parsers often need sequences of local shift and reduce operations to produce certain attachments. Correct individual decisions hence require global information about the sentence context and mistakes cause error propagation. This paper proposes a novel transition system, arc-swift, that enables direct attachments between tokens farther apart with a single transition. This allows the parser to leverage lexical information more directly in transition decisions. Hence, arc-swift can achieve significantly better performance with a very small beam size. Our parsers reduce error by 3.7--7.6% relative to those using existing transition systems on the Penn Treebank dependency parsing task and English Universal Dependencies.

* Accepted at ACL 2017

Via

Access Paper or Ask Questions

Get To The Point: Summarization with Pointer-Generator Networks

Apr 25, 2017

Abigail See, Peter J. Liu, Christopher D. Manning

Figure 1 for Get To The Point: Summarization with Pointer-Generator Networks

Figure 2 for Get To The Point: Summarization with Pointer-Generator Networks

Figure 3 for Get To The Point: Summarization with Pointer-Generator Networks

Figure 4 for Get To The Point: Summarization with Pointer-Generator Networks

Abstract:Neural sequence-to-sequence models have provided a viable new approach for abstractive text summarization (meaning they are not restricted to simply selecting and rearranging passages from the original text). However, these models have two shortcomings: they are liable to reproduce factual details inaccurately, and they tend to repeat themselves. In this work we propose a novel architecture that augments the standard sequence-to-sequence attentional model in two orthogonal ways. First, we use a hybrid pointer-generator network that can copy words from the source text via pointing, which aids accurate reproduction of information, while retaining the ability to produce novel words through the generator. Second, we use coverage to keep track of what has been summarized, which discourages repetition. We apply our model to the CNN / Daily Mail summarization task, outperforming the current abstractive state-of-the-art by at least 2 ROUGE points.

* Add METEOR evaluation results, add some citations, fix some equations (what are now equations 1, 8 and 11 were missing a bias term), fix url to pyrouge package, add acknowledgments

Via

Access Paper or Ask Questions