Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chris Dyer

Recurrent Neural Network Grammars

Oct 12, 2016
Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, Noah A. Smith

Figure 1 for Recurrent Neural Network Grammars

Figure 2 for Recurrent Neural Network Grammars

Figure 3 for Recurrent Neural Network Grammars

Figure 4 for Recurrent Neural Network Grammars

We introduce recurrent neural network grammars, probabilistic models of sentences with explicit phrase structure. We explain efficient inference procedures that allow application to both parsing and language modeling. Experiments show that they provide better parsing in English than any single previously published supervised generative model and better language modeling than state-of-the-art sequential RNNs in English and Chinese.

* Proceedings of NAACL 2016 (contains corrigendum)

Via

Access Paper or Ask Questions

Semantic Parsing with Semi-Supervised Sequential Autoencoders

Sep 29, 2016
Tomáš Kočiský, Gábor Melis, Edward Grefenstette, Chris Dyer, Wang Ling, Phil Blunsom, Karl Moritz Hermann

Figure 1 for Semantic Parsing with Semi-Supervised Sequential Autoencoders

Figure 2 for Semantic Parsing with Semi-Supervised Sequential Autoencoders

Figure 3 for Semantic Parsing with Semi-Supervised Sequential Autoencoders

Figure 4 for Semantic Parsing with Semi-Supervised Sequential Autoencoders

We present a novel semi-supervised approach for sequence transduction and apply it to semantic parsing. The unsupervised component is based on a generative model in which latent sentences generate the unpaired logical forms. We apply this method to a number of semantic parsing tasks focusing on domains with limited access to labelled training data and extend those datasets with synthetically generated logical forms.

Via

Access Paper or Ask Questions

Generalizing and Hybridizing Count-based and Neural Language Models

Sep 26, 2016
Graham Neubig, Chris Dyer

Figure 1 for Generalizing and Hybridizing Count-based and Neural Language Models

Figure 2 for Generalizing and Hybridizing Count-based and Neural Language Models

Figure 3 for Generalizing and Hybridizing Count-based and Neural Language Models

Figure 4 for Generalizing and Hybridizing Count-based and Neural Language Models

Language models (LMs) are statistical models that calculate probabilities over sequences of words or other discrete symbols. Currently two major paradigms for language modeling exist: count-based n-gram models, which have advantages of scalability and test-time speed, and neural LMs, which often achieve superior modeling performance. We demonstrate how both varieties of models can be unified in a single modeling framework that defines a set of probability distributions over the vocabulary of words, and then dynamically calculates mixture weights over these distributions. This formulation allows us to create novel hybrid models that combine the desirable features of count-based and neural LMs, and experiments demonstrate the advantages of these approaches.

* Presented at EMNLP2016

Via

Access Paper or Ask Questions

Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser

Sep 24, 2016
Adhiguna Kuncoro, Miguel Ballesteros, Lingpeng Kong, Chris Dyer, Noah A. Smith

Figure 1 for Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser

Figure 2 for Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser

Figure 3 for Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser

Figure 4 for Distilling an Ensemble of Greedy Dependency Parsers into One MST Parser

We introduce two first-order graph-based dependency parsers achieving a new state of the art. The first is a consensus parser built from an ensemble of independently trained greedy LSTM transition-based parsers with different random initializations. We cast this approach as minimum Bayes risk decoding (under the Hamming cost) and argue that weaker consensus within the ensemble is a useful signal of difficulty or ambiguity. The second parser is a "distillation" of the ensemble into a single model. We train the distillation parser using a structured hinge loss objective with a novel cost that incorporates ensemble uncertainty estimates for each possible attachment, thereby avoiding the intractable cross-entropy computations required by applying standard distillation objectives to problems with structured outputs. The first-order distillation parser matches or surpasses the state of the art on English, Chinese, and German.

* 10 pages. To appear at EMNLP 2016

Via

Access Paper or Ask Questions

Training with Exploration Improves a Greedy Stack-LSTM Parser

Sep 13, 2016
Miguel Ballesteros, Yoav Goldberg, Chris Dyer, Noah A. Smith

Figure 1 for Training with Exploration Improves a Greedy Stack-LSTM Parser

We adapt the greedy Stack-LSTM dependency parser of Dyer et al. (2015) to support a training-with-exploration procedure using dynamic oracles(Goldberg and Nivre, 2013) instead of cross-entropy minimization. This form of training, which accounts for model predictions at training time rather than assuming an error-free action history, improves parsing accuracies for both English and Chinese, obtaining very strong results for both languages. We discuss some modifications needed in order to get training with exploration to work well for a probabilistic neural-network.

* In proceedings of EMNLP 2016

Via

Access Paper or Ask Questions

Many Languages, One Parser

Jul 26, 2016
Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer, Noah A. Smith

We train one multilingual model for dependency parsing and use it to parse sentences in several languages. The parsing model uses (i) multilingual word clusters and embeddings; (ii) token-level language information; and (iii) language-specific features (fine-grained POS tags). This input representation enables the parser not only to parse effectively in multiple languages, but also to generalize across languages based on linguistic universals and typological similarities, making it more effective to learn from limited annotations. Our parser's performance compares favorably to strong baselines in a range of data scenarios, including when the target language has a large treebank, a small treebank, or no treebank for training.

Via

Access Paper or Ask Questions

Neural Machine Translation with Recurrent Attention Modeling

Jul 18, 2016
Zichao Yang, Zhiting Hu, Yuntian Deng, Chris Dyer, Alex Smola

Figure 1 for Neural Machine Translation with Recurrent Attention Modeling

Figure 2 for Neural Machine Translation with Recurrent Attention Modeling

Figure 3 for Neural Machine Translation with Recurrent Attention Modeling

Figure 4 for Neural Machine Translation with Recurrent Attention Modeling

Knowing which words have been attended to in previous time steps while generating a translation is a rich source of information for predicting what words will be attended to in the future. We improve upon the attention model of Bahdanau et al. (2014) by explicitly modeling the relationship between previous and subsequent attention levels for each word using one recurrent network per input word. This architecture easily captures informative features, such as fertility and regularities in relative distortion. In experiments, we show our parameterization of attention improves translation quality.

Via

Access Paper or Ask Questions

Problems With Evaluation of Word Embeddings Using Word Similarity Tasks

Jun 22, 2016
Manaal Faruqui, Yulia Tsvetkov, Pushpendre Rastogi, Chris Dyer

Figure 1 for Problems With Evaluation of Word Embeddings Using Word Similarity Tasks

Figure 2 for Problems With Evaluation of Word Embeddings Using Word Similarity Tasks

Lacking standardized extrinsic evaluation methods for vector representations of words, the NLP community has relied heavily on word similarity tasks as a proxy for intrinsic evaluation of word vectors. Word similarity evaluation, which correlates the distance between vectors and human judgments of semantic similarity is attractive, because it is computationally inexpensive and fast. In this paper we present several problems associated with the evaluation of word vectors on word similarity datasets, and summarize existing solutions. Our study suggests that the use of word similarity tasks for evaluation of word vectors is not sustainable and calls for further research on evaluation methods.

* The First Workshop on Evaluating Vector Space Representations for NLP

Via

Access Paper or Ask Questions

Correlation-based Intrinsic Evaluation of Word Vector Representations

Jun 21, 2016
Yulia Tsvetkov, Manaal Faruqui, Chris Dyer

Figure 1 for Correlation-based Intrinsic Evaluation of Word Vector Representations

Figure 2 for Correlation-based Intrinsic Evaluation of Word Vector Representations

Figure 3 for Correlation-based Intrinsic Evaluation of Word Vector Representations

Figure 4 for Correlation-based Intrinsic Evaluation of Word Vector Representations

We introduce QVEC-CCA--an intrinsic evaluation metric for word vector representations based on correlations of learned vectors with features extracted from linguistic resources. We show that QVEC-CCA scores are an effective proxy for a range of extrinsic semantic and syntactic tasks. We also show that the proposed evaluation obtains higher and more consistent correlations with downstream tasks, compared to existing approaches to intrinsic evaluation of word vectors that are based on word similarity.

* RepEval 2016, 5 pages

Via

Access Paper or Ask Questions

Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning

Jun 21, 2016
Yulia Tsvetkov, Manaal Faruqui, Wang Ling, Brian MacWhinney, Chris Dyer

Figure 1 for Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning

Figure 2 for Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning

Figure 3 for Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning

Figure 4 for Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning

We use Bayesian optimization to learn curricula for word representation learning, optimizing performance on downstream tasks that depend on the learned representations as features. The curricula are modeled by a linear ranking function which is the scalar product of a learned weight vector and an engineered feature vector that characterizes the different aspects of the complexity of each instance in the training corpus. We show that learning the curriculum improves performance on a variety of downstream tasks over random orders and in comparison to the natural corpus order.

* In proceedings of ACL 2016, 10 pages

Via

Access Paper or Ask Questions