Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Richard Socher

Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling

Mar 11, 2017
Hakan Inan, Khashayar Khosravi, Richard Socher

Figure 1 for Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling

Figure 2 for Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling

Figure 3 for Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling

Figure 4 for Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling

Recurrent neural networks have been very successful at predicting sequences of words in tasks such as language modeling. However, all such models are based on the conventional classification framework, where the model is trained against one-hot targets, and each word is represented both as an input and as an output in isolation. This causes inefficiencies in learning both in terms of utilizing all of the information and in terms of the number of parameters needed to train. We introduce a novel theoretical framework that facilitates better learning in language modeling, and show that our framework leads to tying together the input embedding and the output projection matrices, greatly reducing the number of trainable variables. Our framework leads to state of the art performance on the Penn Treebank with a variety of network models.

Via

Access Paper or Ask Questions

A Way out of the Odyssey: Analyzing and Combining Recent Insights for LSTMs

Dec 17, 2016
Shayne Longpre, Sabeek Pradhan, Caiming Xiong, Richard Socher

Figure 1 for A Way out of the Odyssey: Analyzing and Combining Recent Insights for LSTMs

Figure 2 for A Way out of the Odyssey: Analyzing and Combining Recent Insights for LSTMs

Figure 3 for A Way out of the Odyssey: Analyzing and Combining Recent Insights for LSTMs

Figure 4 for A Way out of the Odyssey: Analyzing and Combining Recent Insights for LSTMs

LSTMs have become a basic building block for many deep NLP models. In recent years, many improvements and variations have been proposed for deep sequence models in general, and LSTMs in particular. We propose and analyze a series of augmentations and modifications to LSTM networks resulting in improved performance for text classification datasets. We observe compounding improvements on traditional LSTMs using Monte Carlo test-time model averaging, average pooling, and residual connections, along with four other suggested modifications. Our analysis provides a simple, reliable, and high quality baseline model.

Via

Access Paper or Ask Questions

Quasi-Recurrent Neural Networks

Nov 21, 2016
James Bradbury, Stephen Merity, Caiming Xiong, Richard Socher

Figure 1 for Quasi-Recurrent Neural Networks

Figure 2 for Quasi-Recurrent Neural Networks

Figure 3 for Quasi-Recurrent Neural Networks

Figure 4 for Quasi-Recurrent Neural Networks

Recurrent neural networks are a powerful tool for modeling sequential data, but the dependence of each timestep's computation on the previous timestep's output limits parallelism and makes RNNs unwieldy for very long sequences. We introduce quasi-recurrent neural networks (QRNNs), an approach to neural sequence modeling that alternates convolutional layers, which apply in parallel across timesteps, and a minimalist recurrent pooling function that applies in parallel across channels. Despite lacking trainable recurrent layers, stacked QRNNs have better predictive accuracy than stacked LSTMs of the same hidden size. Due to their increased parallelism, they are up to 16 times faster at train and test time. Experiments on language modeling, sentiment classification, and character-level neural machine translation demonstrate these advantages and underline the viability of QRNNs as a basic building block for a variety of sequence tasks.

* Submitted to conference track at ICLR 2017

Via

Access Paper or Ask Questions

Pointer Sentinel Mixture Models

Sep 26, 2016
Stephen Merity, Caiming Xiong, James Bradbury, Richard Socher

Figure 1 for Pointer Sentinel Mixture Models

Figure 2 for Pointer Sentinel Mixture Models

Figure 3 for Pointer Sentinel Mixture Models

Figure 4 for Pointer Sentinel Mixture Models

Recent neural network sequence models with softmax classifiers have achieved their best language modeling performance only with very large hidden states and large vocabularies. Even then they struggle to predict rare or unseen words even if the context makes the prediction unambiguous. We introduce the pointer sentinel mixture architecture for neural sequence models which has the ability to either reproduce a word from the recent context or produce a word from a standard softmax classifier. Our pointer sentinel-LSTM model achieves state of the art language modeling performance on the Penn Treebank (70.9 perplexity) while using far fewer parameters than a standard softmax LSTM. In order to evaluate how well language models can exploit longer contexts and deal with more realistic vocabularies and larger corpora we also introduce the freely available WikiText corpus.

Via

Access Paper or Ask Questions

Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

Mar 05, 2016
Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, Richard Socher

Figure 1 for Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

Figure 2 for Ask Me Anything: Dynamic Memory Networks for Natural Language Processing

Most tasks in natural language processing can be cast into question answering (QA) problems over language input. We introduce the dynamic memory network (DMN), a neural network architecture which processes input sequences and questions, forms episodic memories, and generates relevant answers. Questions trigger an iterative attention process which allows the model to condition its attention on the inputs and the result of previous iterations. These results are then reasoned over in a hierarchical recurrent sequence model to generate answers. The DMN can be trained end-to-end and obtains state-of-the-art results on several types of tasks and datasets: question answering (Facebook's bAbI dataset), text classification for sentiment analysis (Stanford Sentiment Treebank) and sequence modeling for part-of-speech tagging (WSJ-PTB). The training for these different tasks relies exclusively on trained word vector representations and input-question-answer triplets.

Via

Access Paper or Ask Questions

Dynamic Memory Networks for Visual and Textual Question Answering

Mar 04, 2016
Caiming Xiong, Stephen Merity, Richard Socher

Figure 1 for Dynamic Memory Networks for Visual and Textual Question Answering

Figure 2 for Dynamic Memory Networks for Visual and Textual Question Answering

Figure 3 for Dynamic Memory Networks for Visual and Textual Question Answering

Figure 4 for Dynamic Memory Networks for Visual and Textual Question Answering

Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering. One such architecture, the dynamic memory network (DMN), obtained high accuracy on a variety of language tasks. However, it was not shown whether the architecture achieves strong results for question answering when supporting facts are not marked during training or whether it could be applied to other modalities such as images. Based on an analysis of the DMN, we propose several improvements to its memory and input modules. Together with these changes we introduce a novel input module for images in order to be able to answer visual questions. Our new DMN+ model improves the state of the art on both the Visual Question Answering dataset and the \babi-10k text question-answering dataset without supporting fact supervision.

Via

Access Paper or Ask Questions

Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

May 30, 2015
Kai Sheng Tai, Richard Socher, Christopher D. Manning

Figure 1 for Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

Figure 2 for Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

Figure 3 for Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

Figure 4 for Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

Because of their superior ability to preserve sequence information over time, Long Short-Term Memory (LSTM) networks, a type of recurrent neural network with a more complex computational unit, have obtained strong results on a variety of sequence modeling tasks. The only underlying LSTM structure that has been explored so far is a linear chain. However, natural language exhibits syntactic properties that would naturally combine words to phrases. We introduce the Tree-LSTM, a generalization of LSTMs to tree-structured network topologies. Tree-LSTMs outperform all existing systems and strong LSTM baselines on two tasks: predicting the semantic relatedness of two sentences (SemEval 2014, Task 1) and sentiment classification (Stanford Sentiment Treebank).

* Accepted for publication at ACL 2015

Via

Access Paper or Ask Questions

Zero-Shot Learning Through Cross-Modal Transfer

Mar 20, 2013
Richard Socher, Milind Ganjoo, Hamsa Sridhar, Osbert Bastani, Christopher D. Manning, Andrew Y. Ng

Figure 1 for Zero-Shot Learning Through Cross-Modal Transfer

Figure 2 for Zero-Shot Learning Through Cross-Modal Transfer

Figure 3 for Zero-Shot Learning Through Cross-Modal Transfer

This work introduces a model that can recognize objects in images even if no training data is available for the objects. The only necessary knowledge about the unseen categories comes from unsupervised large text corpora. In our zero-shot framework distributional information in language can be seen as spanning a semantic basis for understanding what objects look like. Most previous zero-shot learning models can only differentiate between unseen classes. In contrast, our model can both obtain state of the art performance on classes that have thousands of training images and obtain reasonable performance on unseen classes. This is achieved by first using outlier detection in the semantic space and then two separate recognition models. Furthermore, our model does not require any manually defined semantic features for either words or images.

Via

Access Paper or Ask Questions

Learning New Facts From Knowledge Bases With Neural Tensor Networks and Semantic Word Vectors

Mar 16, 2013
Danqi Chen, Richard Socher, Christopher D. Manning, Andrew Y. Ng

Knowledge bases provide applications with the benefit of easily accessible, systematic relational knowledge but often suffer in practice from their incompleteness and lack of knowledge of new entities and relations. Much work has focused on building or extending them by finding patterns in large unannotated text corpora. In contrast, here we mainly aim to complete a knowledge base by predicting additional true relationships between entities, based on generalizations that can be discerned in the given knowledgebase. We introduce a neural tensor network (NTN) model which predicts new relationship entries that can be added to the database. This model can be improved by initializing entity representations with word vectors learned in an unsupervised fashion from text, and when doing this, existing relations can even be queried for entities that were not present in the database. Our model generalizes and outperforms existing models for this problem, and can classify unseen relationships in WordNet with an accuracy of 75.8%.

Via

Access Paper or Ask Questions