Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Edward Grefenstette

Discovering Discrete Latent Topics with Neural Variational Inference

May 21, 2018
Yishu Miao, Edward Grefenstette, Phil Blunsom

Figure 1 for Discovering Discrete Latent Topics with Neural Variational Inference

Figure 2 for Discovering Discrete Latent Topics with Neural Variational Inference

Figure 3 for Discovering Discrete Latent Topics with Neural Variational Inference

Figure 4 for Discovering Discrete Latent Topics with Neural Variational Inference

Topic models have been widely explored as probabilistic generative models of documents. Traditional inference methods have sought closed-form derivations for updating the models, however as the expressiveness of these models grows, so does the difficulty of performing fast and accurate inference over their parameters. This paper presents alternative neural approaches to topic modelling by providing parameterisable distributions over topics which permit training by backpropagation in the framework of neural variational inference. In addition, with the help of a stick-breaking construction, we propose a recurrent network that is able to discover a notionally unbounded number of topics, analogous to Bayesian non-parametric topic models. Experimental results on the MXM Song Lyrics, 20NewsGroups and Reuters News datasets demonstrate the effectiveness and efficiency of these neural topic models.

* ICML 2017

Via

Access Paper or Ask Questions

Learning to Compute Word Embeddings On the Fly

Mar 07, 2018
Dzmitry Bahdanau, Tom Bosc, Stanisław Jastrzębski, Edward Grefenstette, Pascal Vincent, Yoshua Bengio

Figure 1 for Learning to Compute Word Embeddings On the Fly

Figure 2 for Learning to Compute Word Embeddings On the Fly

Figure 3 for Learning to Compute Word Embeddings On the Fly

Figure 4 for Learning to Compute Word Embeddings On the Fly

Words in natural language follow a Zipfian distribution whereby some words are frequent but most are rare. Learning representations for words in the "long tail" of this distribution requires enormous amounts of data. Representations of rare words trained directly on end tasks are usually poor, requiring us to pre-train embeddings on external data, or treat all rare words as out-of-vocabulary words with a unique representation. We provide a method for predicting embeddings of rare words on the fly from small amounts of auxiliary data with a network trained end-to-end for the downstream task. We show that this improves results against baselines where embeddings are trained on the end task for reading comprehension, recognizing textual entailment and language modeling.

Via

Access Paper or Ask Questions

Can Neural Networks Understand Logical Entailment?

Feb 23, 2018
Richard Evans, David Saxton, David Amos, Pushmeet Kohli, Edward Grefenstette

Figure 1 for Can Neural Networks Understand Logical Entailment?

Figure 2 for Can Neural Networks Understand Logical Entailment?

Figure 3 for Can Neural Networks Understand Logical Entailment?

Figure 4 for Can Neural Networks Understand Logical Entailment?

We introduce a new dataset of logical entailments for the purpose of measuring models' ability to capture and exploit the structure of logical expressions against an entailment prediction task. We use this task to compare a series of architectures which are ubiquitous in the sequence-processing literature, in addition to a new model class---PossibleWorldNets---which computes entailment as a "convolution over possible worlds". Results show that convolutional networks present the wrong inductive bias for this class of problems relative to LSTM RNNs, tree-structured neural networks outperform LSTM RNNs due to their enhanced ability to exploit the syntax of logic, and PossibleWorldNets outperform all benchmarks.

* Published at ICLR 2018 (main conference)

Via

Access Paper or Ask Questions

Learning Explanatory Rules from Noisy Data

Jan 25, 2018
Richard Evans, Edward Grefenstette

Figure 1 for Learning Explanatory Rules from Noisy Data

Figure 2 for Learning Explanatory Rules from Noisy Data

Figure 3 for Learning Explanatory Rules from Noisy Data

Figure 4 for Learning Explanatory Rules from Noisy Data

Artificial Neural Networks are powerful function approximators capable of modelling solutions to a wide variety of problems, both supervised and unsupervised. As their size and expressivity increases, so too does the variance of the model, yielding a nearly ubiquitous overfitting problem. Although mitigated by a variety of model regularisation methods, the common cure is to seek large amounts of training data---which is not necessarily easily obtained---that sufficiently approximates the data distribution of the domain we wish to test on. In contrast, logic programming methods such as Inductive Logic Programming offer an extremely data-efficient process by which models can be trained to reason on symbolic domains. However, these methods are unable to deal with the variety of domains neural networks can be applied to: they are not robust to noise in or mislabelling of inputs, and perhaps more importantly, cannot be applied to non-symbolic domains where the data is ambiguous, such as operating on raw pixels. In this paper, we propose a Differentiable Inductive Logic framework, which can not only solve tasks which traditional ILP systems are suited for, but shows a robustness to noise and error in the training data which ILP cannot cope with. Furthermore, as it is trained by backpropagation against a likelihood objective, it can be hybridised by connecting it with neural networks over ambiguous data in order to be applied to domains which ILP cannot address, while providing data efficiency and generalisation beyond what neural networks on their own can achieve.

* 64 pages, to appear in Journal of Artificial Intelligence Research (Special Track on Deep Learning, Knowledge Representation, and Reasoning)

Via

Access Paper or Ask Questions

The NarrativeQA Reading Comprehension Challenge

Dec 19, 2017
Tomáš Kočiský, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, Gábor Melis, Edward Grefenstette

Reading comprehension (RC)---in contrast to information retrieval---requires integrating information and reasoning about events, entities, and their relations across a full document. Question answering is conventionally used to assess RC ability, in both artificial agents and children learning to read. However, existing RC datasets and tasks are dominated by questions that can be solved by selecting answers using superficial information (e.g., local context similarity or global term frequency); they thus fail to test for the essential integrative aspect of RC. To encourage progress on deeper comprehension of language, we present a new dataset and set of tasks in which the reader must answer questions about stories by reading entire books or movie scripts. These tasks are designed so that successfully answering their questions requires understanding the underlying narrative rather than relying on shallow pattern matching or salience. We show that although humans solve the tasks easily, standard RC models struggle on the tasks presented here. We provide an analysis of the dataset and the challenges it presents.

Via

Access Paper or Ask Questions

The Neural Noisy Channel

Mar 06, 2017
Lei Yu, Phil Blunsom, Chris Dyer, Edward Grefenstette, Tomas Kocisky

We formulate sequence to sequence transduction as a noisy channel decoding problem and use recurrent neural networks to parameterise the source and channel models. Unlike direct models which can suffer from explaining-away effects during training, noisy channel models must produce outputs that explain their inputs, and their component models can be trained with not only paired training samples but also unpaired samples from the marginal output distribution. Using a latent variable to control how much of the conditioning sequence the channel model needs to read in order to generate a subsequent symbol, we obtain a tractable and effective beam search decoder. Experimental results on abstractive sentence summarisation, morphological inflection, and machine translation show that noisy channel models outperform direct models, and that they significantly benefit from increased amounts of unpaired output data that direct models cannot easily use.

* ICLR 2017

Via

Access Paper or Ask Questions

Learning to Compose Words into Sentences with Reinforcement Learning

Nov 28, 2016
Dani Yogatama, Phil Blunsom, Chris Dyer, Edward Grefenstette, Wang Ling

Figure 1 for Learning to Compose Words into Sentences with Reinforcement Learning

Figure 2 for Learning to Compose Words into Sentences with Reinforcement Learning

Figure 3 for Learning to Compose Words into Sentences with Reinforcement Learning

Figure 4 for Learning to Compose Words into Sentences with Reinforcement Learning

We use reinforcement learning to learn tree-structured neural networks for computing representations of natural language sentences. In contrast with prior work on tree-structured models in which the trees are either provided as input or predicted using supervision from explicit treebank annotations, the tree structures in this work are optimized to improve performance on a downstream task. Experiments demonstrate the benefit of learning task-specific composition orders, outperforming both sequential encoders and recursive encoders based on treebank annotations. We analyze the induced trees and show that while they discover some linguistically intuitive structures (e.g., noun phrases, simple verb phrases), they are different than conventional English syntactic structures.

Via

Access Paper or Ask Questions

Semantic Parsing with Semi-Supervised Sequential Autoencoders

Sep 29, 2016
Tomáš Kočiský, Gábor Melis, Edward Grefenstette, Chris Dyer, Wang Ling, Phil Blunsom, Karl Moritz Hermann

Figure 1 for Semantic Parsing with Semi-Supervised Sequential Autoencoders

Figure 2 for Semantic Parsing with Semi-Supervised Sequential Autoencoders

Figure 3 for Semantic Parsing with Semi-Supervised Sequential Autoencoders

Figure 4 for Semantic Parsing with Semi-Supervised Sequential Autoencoders

We present a novel semi-supervised approach for sequence transduction and apply it to semantic parsing. The unsupervised component is based on a generative model in which latent sentences generate the unpaired logical forms. We apply this method to a number of semantic parsing tasks focusing on domains with limited access to labelled training data and extend those datasets with synthetically generated logical forms.

Via

Access Paper or Ask Questions

Latent Predictor Networks for Code Generation

Jun 08, 2016
Wang Ling, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Andrew Senior, Fumin Wang, Phil Blunsom

Figure 1 for Latent Predictor Networks for Code Generation

Figure 2 for Latent Predictor Networks for Code Generation

Figure 3 for Latent Predictor Networks for Code Generation

Figure 4 for Latent Predictor Networks for Code Generation

Many language generation tasks require the production of text conditioned on both structured and unstructured inputs. We present a novel neural network architecture which generates an output sequence conditioned on an arbitrary number of input functions. Crucially, our approach allows both the choice of conditioning context and the granularity of generation, for example characters or tokens, to be marginalised, thus permitting scalable and effective training. Using this framework, we address the problem of generating programming code from a mixed natural language and structured specification. We create two new data sets for this paradigm derived from the collectible trading card games Magic the Gathering and Hearthstone. On these, and a third preexisting corpus, we demonstrate that marginalising multiple predictors allows our model to outperform strong benchmarks.

Via

Access Paper or Ask Questions

Reasoning about Entailment with Neural Attention

Mar 01, 2016
Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiský, Phil Blunsom

Figure 1 for Reasoning about Entailment with Neural Attention

Figure 2 for Reasoning about Entailment with Neural Attention

Figure 3 for Reasoning about Entailment with Neural Attention

Figure 4 for Reasoning about Entailment with Neural Attention

While most approaches to automatically recognizing entailment relations have used classifiers employing hand engineered features derived from complex natural language processing pipelines, in practice their performance has been only slightly better than bag-of-word pair classifiers using only lexical similarity. The only attempt so far to build an end-to-end differentiable neural network for entailment failed to outperform such a simple similarity classifier. In this paper, we propose a neural model that reads two sentences to determine entailment using long short-term memory units. We extend this model with a word-by-word neural attention mechanism that encourages reasoning over entailments of pairs of words and phrases. Furthermore, we present a qualitative analysis of attention weights produced by this model, demonstrating such reasoning capabilities. On a large entailment dataset this model outperforms the previous best neural model and a classifier with engineered features by a substantial margin. It is the first generic end-to-end differentiable system that achieves state-of-the-art accuracy on a textual entailment dataset.

* ICLR 2016 camera-ready, 9 pages, 10 figures (incl. subfigures)

Via

Access Paper or Ask Questions