Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yoav Goldberg

On-the-fly Operation Batching in Dynamic Computation Graphs

May 22, 2017
Graham Neubig, Yoav Goldberg, Chris Dyer

Figure 1 for On-the-fly Operation Batching in Dynamic Computation Graphs

Figure 2 for On-the-fly Operation Batching in Dynamic Computation Graphs

Figure 3 for On-the-fly Operation Batching in Dynamic Computation Graphs

Figure 4 for On-the-fly Operation Batching in Dynamic Computation Graphs

Dynamic neural network toolkits such as PyTorch, DyNet, and Chainer offer more flexibility for implementing models that cope with data of varying dimensions and structure, relative to toolkits that operate on statically declared computations (e.g., TensorFlow, CNTK, and Theano). However, existing toolkits - both static and dynamic - require that the developer organize the computations into the batches necessary for exploiting high-performance algorithms and hardware. This batching task is generally difficult, but it becomes a major hurdle as architectures become complex. In this paper, we present an algorithm, and its implementation in the DyNet toolkit, for automatically batching operations. Developers simply write minibatch computations as aggregations of single instance computations, and the batching algorithm seamlessly executes them, on the fly, using computationally efficient batched operations. On a variety of tasks, we obtain throughput similar to that obtained with manual batches, as well as comparable speedups over single-instance learning on architectures that are impractical to batch manually.

Via

Access Paper or Ask Questions

Morphological Inflection Generation with Hard Monotonic Attention

Apr 11, 2017
Roee Aharoni, Yoav Goldberg

Figure 1 for Morphological Inflection Generation with Hard Monotonic Attention

Figure 2 for Morphological Inflection Generation with Hard Monotonic Attention

Figure 3 for Morphological Inflection Generation with Hard Monotonic Attention

Figure 4 for Morphological Inflection Generation with Hard Monotonic Attention

We present a neural model for morphological inflection generation which employs a hard attention mechanism, inspired by the nearly-monotonic alignment commonly found between the characters in a word and the characters in its inflection. We evaluate the model on three previously studied morphological inflection generation datasets and show that it provides state of the art results in various setups compared to previous neural and non-neural approaches. Finally we present an analysis of the continuous representations learned by both the hard and soft attention \cite{bahdanauCB14} models for the task, shedding some light on the features such models extract.

* Accepted as a long paper in ACL 2017

Via

Access Paper or Ask Questions

The Interplay of Semantics and Morphology in Word Embeddings

Apr 06, 2017
Oded Avraham, Yoav Goldberg

Figure 1 for The Interplay of Semantics and Morphology in Word Embeddings

Figure 2 for The Interplay of Semantics and Morphology in Word Embeddings

We explore the ability of word embeddings to capture both semantic and morphological similarity, as affected by the different types of linguistic properties (surface form, lemma, morphological tag) used to compose the representation of each word. We train several models, where each uses a different subset of these properties to compose its representations. By evaluating the models on semantic and morphological measures, we reveal some useful insights on the relationship between semantics and morphology.

Via

Access Paper or Ask Questions

Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure

Feb 27, 2017
Oded Avraham, Yoav Goldberg

Figure 1 for Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure

Figure 2 for Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure

Figure 3 for Improving Reliability of Word Similarity Evaluation by Redesigning Annotation Task and Performance Measure

We suggest a new method for creating and using gold-standard datasets for word similarity evaluation. Our goal is to improve the reliability of the evaluation, and we do this by redesigning the annotation task to achieve higher inter-rater agreement, and by defining a performance measure which takes the reliability of each annotation decision in the dataset into account.

Via

Access Paper or Ask Questions

Improving a Strong Neural Parser with Conjunction-Specific Features

Feb 22, 2017
Jessica Ficler, Yoav Goldberg

Figure 1 for Improving a Strong Neural Parser with Conjunction-Specific Features

Figure 2 for Improving a Strong Neural Parser with Conjunction-Specific Features

Figure 3 for Improving a Strong Neural Parser with Conjunction-Specific Features

Figure 4 for Improving a Strong Neural Parser with Conjunction-Specific Features

While dependency parsers reach very high overall accuracy, some dependency relations are much harder than others. In particular, dependency parsers perform poorly in coordination construction (i.e., correctly attaching the "conj" relation). We extend a state-of-the-art dependency parser with conjunction-specific features, focusing on the similarity between the conjuncts head words. Training the extended parser yields an improvement in "conj" attachment as well as in overall dependency parsing accuracy on the Stanford dependency conversion of the Penn TreeBank.

* EACL 2017

Via

Access Paper or Ask Questions

Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks

Feb 09, 2017
Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer Lavi, Yoav Goldberg

Figure 1 for Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks

Figure 2 for Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks

Figure 3 for Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks

Figure 4 for Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks

There is a lot of research interest in encoding variable length sentences into fixed length vectors, in a way that preserves the sentence meanings. Two common methods include representations based on averaging word vectors, and representations based on the hidden states of recurrent neural networks such as LSTMs. The sentence vectors are used as features for subsequent machine learning tasks or for pre-training in the context of deep learning. However, not much is known about the properties that are encoded in these sentence representations and about the language information they capture. We propose a framework that facilitates better understanding of the encoded representations. We define prediction tasks around isolated aspects of sentence structure (namely sentence length, word content, and word order), and score representations by the ability to train a classifier to solve each prediction task when using the representation as input. We demonstrate the potential contribution of the approach by analyzing different sentence representation mechanisms. The analysis sheds light on the relative strengths of different sentence embedding methods with respect to these low level prediction tasks, and on the effect of the encoded vector's dimensionality on the resulting representations.

Via

Access Paper or Ask Questions

DyNet: The Dynamic Neural Network Toolkit

Jan 15, 2017
Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin

Figure 1 for DyNet: The Dynamic Neural Network Toolkit

Figure 2 for DyNet: The Dynamic Neural Network Toolkit

Figure 3 for DyNet: The Dynamic Neural Network Toolkit

Figure 4 for DyNet: The Dynamic Neural Network Toolkit

We describe DyNet, a toolkit for implementing neural network models based on dynamic declaration of network structure. In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its derivatives. In DyNet's dynamic declaration strategy, computation graph construction is mostly transparent, being implicitly constructed by executing procedural code that computes the network outputs, and the user is free to use different network structures for each input. Dynamic declaration thus facilitates the implementation of more complicated network architectures, and DyNet is specifically designed to allow users to implement their models in a way that is idiomatic in their preferred programming language (C++ or Python). One challenge with dynamic declaration is that because the symbolic computation graph is defined anew for every training example, its construction must have low overhead. To achieve this, DyNet has an optimized C++ backend and lightweight graph representation. Experiments show that DyNet's speeds are faster than or comparable with static declaration toolkits, and significantly faster than Chainer, another dynamic declaration toolkit. DyNet is released open-source under the Apache 2.0 license and available at http://github.com/clab/dynet.

* 33 pages

Via

Access Paper or Ask Questions

A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments

Jan 09, 2017
Omer Levy, Anders Søgaard, Yoav Goldberg

Figure 1 for A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments

Figure 2 for A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments

Figure 3 for A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments

Figure 4 for A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments

While cross-lingual word embeddings have been studied extensively in recent years, the qualitative differences between the different algorithms remain vague. We observe that whether or not an algorithm uses a particular feature set (sentence IDs) accounts for a significant performance gap among these algorithms. This feature set is also used by traditional alignment algorithms, such as IBM Model-1, which demonstrate similar performance to state-of-the-art embedding algorithms on a variety of benchmarks. Overall, we observe that different algorithmic approaches for utilizing the sentence ID feature space result in similar performance. This paper draws both empirical and theoretical parallels between the embedding and alignment literature, and suggests that adding additional sources of information, which go beyond the traditional signal of bilingual sentence-aligned corpora, may substantially improve cross-lingual word embeddings, and that future baselines should at least take such features into account.

* EACL 2017

Via

Access Paper or Ask Questions

Semi Supervised Preposition-Sense Disambiguation using Multilingual Data

Nov 27, 2016
Hila Gonen, Yoav Goldberg

Figure 1 for Semi Supervised Preposition-Sense Disambiguation using Multilingual Data

Figure 2 for Semi Supervised Preposition-Sense Disambiguation using Multilingual Data

Figure 3 for Semi Supervised Preposition-Sense Disambiguation using Multilingual Data

Figure 4 for Semi Supervised Preposition-Sense Disambiguation using Multilingual Data

Prepositions are very common and very ambiguous, and understanding their sense is critical for understanding the meaning of the sentence. Supervised corpora for the preposition-sense disambiguation task are small, suggesting a semi-supervised approach to the task. We show that signals from unannotated multilingual data can be used to improve supervised preposition-sense disambiguation. Our approach pre-trains an LSTM encoder for predicting the translation of a preposition, and then incorporates the pre-trained encoder as a component in a supervised classification system, and fine-tunes it for the task. The multilingual signals consistently improve results on two preposition-sense datasets.

* 12 pages; COLING 2016

Via

Access Paper or Ask Questions

Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies

Nov 04, 2016
Tal Linzen, Emmanuel Dupoux, Yoav Goldberg

The success of long short-term memory (LSTM) neural networks in language processing is typically attributed to their ability to capture long-distance statistical regularities. Linguistic regularities are often sensitive to syntactic structure; can such dependencies be captured by LSTMs, which do not have explicit structural representations? We begin addressing this question using number agreement in English subject-verb dependencies. We probe the architecture's grammatical competence both using training objectives with an explicit grammatical target (number prediction, grammaticality judgments) and using language models. In the strongly supervised settings, the LSTM achieved very high overall accuracy (less than 1% errors), but errors increased when sequential and structural information conflicted. The frequency of such errors rose sharply in the language-modeling setting. We conclude that LSTMs can capture a non-trivial amount of grammatical structure given targeted supervision, but stronger architectures may be required to further reduce errors; furthermore, the language modeling signal is insufficient for capturing syntax-sensitive dependencies, and should be supplemented with more direct supervision if such dependencies need to be captured.

* 15 pages; to appear in Transactions of the Association for Computational Linguistics

Via

Access Paper or Ask Questions