Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pontus Stenetorp

On the Importance of Strong Baselines in Bayesian Deep Learning

Nov 30, 2018
Jishnu Mukhoti, Pontus Stenetorp, Yarin Gal

Figure 1 for On the Importance of Strong Baselines in Bayesian Deep Learning

Figure 2 for On the Importance of Strong Baselines in Bayesian Deep Learning

Like all sub-fields of machine learning Bayesian Deep Learning is driven by empirical validation of its theoretical proposals. Given the many aspects of an experiment it is always possible that minor or even major experimental flaws can slip by both authors and reviewers. One of the most popular experiments used to evaluate approximate inference techniques is the regression experiment on UCI datasets. However, in this experiment, models which have been trained to convergence have often been compared with baselines trained only for a fixed number of iterations. We find that a well-established baseline, Monte Carlo dropout, when evaluated under the same experimental settings shows significant improvements. In fact, the baseline outperforms or performs competitively with methods that claimed to be superior to the very same baseline method when they were introduced. Hence, by exposing this flaw in experimental procedure, we highlight the importance of using identical experimental setups to evaluate, compare, and benchmark methods in Bayesian Deep Learning.

* Bayesian Deep Learning Workshop, NeurIPS 2018

Via

Access Paper or Ask Questions

Application of Clinical Concept Embeddings for Heart Failure Prediction in UK EHR data

Nov 28, 2018
Spiros Denaxas, Pontus Stenetorp, Sebastian Riedel, Maria Pikoula, Richard Dobson, Harry Hemingway

Figure 1 for Application of Clinical Concept Embeddings for Heart Failure Prediction in UK EHR data

Figure 2 for Application of Clinical Concept Embeddings for Heart Failure Prediction in UK EHR data

Figure 3 for Application of Clinical Concept Embeddings for Heart Failure Prediction in UK EHR data

Figure 4 for Application of Clinical Concept Embeddings for Heart Failure Prediction in UK EHR data

Electronic health records (EHR) are increasingly being used for constructing disease risk prediction models. Feature engineering in EHR data however is challenging due to their highly dimensional and heterogeneous nature. Low-dimensional representations of EHR data can potentially mitigate these challenges. In this paper, we use global vectors (GloVe) to learn word embeddings for diagnoses and procedures recorded using 13 million ontology terms across 2.7 million hospitalisations in national UK EHR. We demonstrate the utility of these embeddings by evaluating their performance in identifying patients which are at higher risk of being hospitalised for congestive heart failure. Our findings indicate that embeddings can enable the creation of robust EHR-derived disease risk prediction models and address some the limitations associated with manual clinical feature engineering.

* Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

Via

Access Paper or Ask Questions

Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection

Sep 26, 2018
Sudhanshu Kasewa, Pontus Stenetorp, Sebastian Riedel

Figure 1 for Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection

Figure 2 for Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection

Figure 3 for Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection

Figure 4 for Wronging a Right: Generating Better Errors to Improve Grammatical Error Detection

Grammatical error correction, like other machine learning tasks, greatly benefits from large quantities of high quality training data, which is typically expensive to produce. While writing a program to automatically generate realistic grammatical errors would be difficult, one could learn the distribution of naturallyoccurring errors and attempt to introduce them into other datasets. Initial work on inducing errors in this way using statistical machine translation has shown promise; we investigate cheaply constructing synthetic samples, given a small corpus of human-annotated data, using an off-the-rack attentive sequence-to-sequence model and a straight-forward post-processing procedure. Our approach yields error-filled artificial data that helps a vanilla bi-directional LSTM to outperform the previous state of the art at grammatical error detection, and a previously introduced model to gain further improvements of over 5% $F_{0.5}$ score. When attempting to determine if a given sentence is synthetic, a human annotator at best achieves 39.39 $F_1$ score, indicating that our model generates mostly human-like instances.

* Accepted as a short paper at EMNLP 2018

Via

Access Paper or Ask Questions

Convolutional 2D Knowledge Graph Embeddings

Jul 04, 2018
Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel

Figure 1 for Convolutional 2D Knowledge Graph Embeddings

Figure 2 for Convolutional 2D Knowledge Graph Embeddings

Figure 3 for Convolutional 2D Knowledge Graph Embeddings

Figure 4 for Convolutional 2D Knowledge Graph Embeddings

Link prediction for knowledge graphs is the task of predicting missing relationships between entities. Previous work on link prediction has focused on shallow, fast models which can scale to large knowledge graphs. However, these models learn less expressive features than deep, multi-layer models -- which potentially limits performance. In this work, we introduce ConvE, a multi-layer convolutional network model for link prediction, and report state-of-the-art results for several established datasets. We also show that the model is highly parameter efficient, yielding the same performance as DistMult and R-GCN with 8x and 17x fewer parameters. Analysis of our model suggests that it is particularly effective at modelling nodes with high indegree -- which are common in highly-connected, complex knowledge graphs such as Freebase and YAGO3. In addition, it has been noted that the WN18 and FB15k datasets suffer from test set leakage, due to inverse relations from the training set being present in the test set -- however, the extent of this issue has so far not been quantified. We find this problem to be severe: a simple rule-based model can achieve state-of-the-art results on both WN18 and FB15k. To ensure that models are evaluated on datasets where simply exploiting inverse relations cannot yield competitive results, we investigate and validate several commonly used datasets -- deriving robust variants where necessary. We then perform experiments on these robust datasets for our own and several previously proposed models and find that ConvE achieves state-of-the-art Mean Reciprocal Rank across most datasets.

* Extended AAAI2018 paper

Via

Access Paper or Ask Questions

Jack the Reader - A Machine Reading Framework

Jun 20, 2018
Dirk Weissenborn, Pasquale Minervini, Tim Dettmers, Isabelle Augenstein, Johannes Welbl, Tim Rocktäschel, Matko Bošnjak, Jeff Mitchell, Thomas Demeester, Pontus Stenetorp, Sebastian Riedel

Figure 1 for Jack the Reader - A Machine Reading Framework

Figure 2 for Jack the Reader - A Machine Reading Framework

Figure 3 for Jack the Reader - A Machine Reading Framework

Figure 4 for Jack the Reader - A Machine Reading Framework

Many Machine Reading and Natural Language Understanding tasks require reading supporting text in order to answer questions. For example, in Question Answering, the supporting text can be newswire or Wikipedia articles; in Natural Language Inference, premises can be seen as the supporting text and hypotheses as questions. Providing a set of useful primitives operating in a single framework of related tasks would allow for expressive modelling, and easier model comparison and replication. To that end, we present Jack the Reader (Jack), a framework for Machine Reading that allows for quick model prototyping by component reuse, evaluation of new models on existing datasets as well as integrating new datasets and applying them on a growing set of implemented baseline models. Jack is currently supporting (but not limited to) three tasks: Question Answering, Natural Language Inference, and Link Prediction. It is developed with the aim of increasing research efficiency and code reuse.

* Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL 2018), System Demonstrations

Via

Access Paper or Ask Questions

Constructing Datasets for Multi-hop Reading Comprehension Across Documents

Jun 11, 2018
Johannes Welbl, Pontus Stenetorp, Sebastian Riedel

Most Reading Comprehension methods limit themselves to queries which can be answered using a single sentence, paragraph, or document. Enabling models to combine disjoint pieces of textual evidence would extend the scope of machine comprehension methods, but currently there exist no resources to train and test this capability. We propose a novel task to encourage the development of models for text understanding across multiple documents and to investigate the limits of existing methods. In our task, a model learns to seek and combine evidence - effectively performing multi-hop (alias multi-step) inference. We devise a methodology to produce datasets for this task, given a collection of query-answer pairs and thematically linked documents. Two datasets from different domains are induced, and we identify potential pitfalls and devise circumvention strategies. We evaluate two previously proposed competitive models and find that one can integrate information across documents. However, both models struggle to select relevant information, as providing documents guaranteed to be relevant greatly improves their performance. While the models outperform several strong baselines, their best accuracy reaches 42.9% compared to human performance at 74.0% - leaving ample room for improvement.

* Transactions of the Association for Computational Linguistics (TACL), Vol 6 (2018), pages 287-302
* This paper directly corresponds to the TACL version (https://transacl.org/ojs/index.php/tacl/article/view/1325) apart from minor changes in wording, additional footnotes, and appendices

Via

Access Paper or Ask Questions

Extrapolation in NLP

May 17, 2018
Jeff Mitchell, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel

We argue that extrapolation to examples outside the training space will often be easier for models that capture global structures, rather than just maximise their local fit to the training data. We show that this is true for two popular models: the Decomposable Attention Model and word2vec.

Via

Access Paper or Ask Questions

Neural Architectures for Fine-grained Entity Type Classification

Feb 21, 2017
Sonse Shimaoka, Pontus Stenetorp, Kentaro Inui, Sebastian Riedel

Figure 1 for Neural Architectures for Fine-grained Entity Type Classification

Figure 2 for Neural Architectures for Fine-grained Entity Type Classification

Figure 3 for Neural Architectures for Fine-grained Entity Type Classification

Figure 4 for Neural Architectures for Fine-grained Entity Type Classification

In this work, we investigate several neural network architectures for fine-grained entity type classification. Particularly, we consider extensions to a recently proposed attentive neural architecture and make three key contributions. Previous work on attentive neural architectures do not consider hand-crafted features, we combine learnt and hand-crafted features and observe that they complement each other. Additionally, through quantitative analysis we establish that the attention mechanism is capable of learning to attend over syntactic heads and the phrase containing the mention, where both are known strong hand-crafted features for our task. We enable parameter sharing through a hierarchical label encoding method, that in low-dimensional projections show clear clusters for each type hierarchy. Lastly, despite using the same evaluation dataset, the literature frequently compare models trained using different data. We establish that the choice of training data has a drastic impact on performance, with decreases by as much as 9.85% loose micro F1 score for a previously proposed method. Despite this, our best model achieves state-of-the-art results with 75.36% loose micro F1 score on the well- established FIGER (GOLD) dataset.

* 10 pages, 3 figures, accepted at EACL2017 conference

Via

Access Paper or Ask Questions

Deep Semi-Supervised Learning with Linguistically Motivated Sequence Labeling Task Hierarchies

Dec 29, 2016
Jonathan Godwin, Pontus Stenetorp, Sebastian Riedel

Figure 1 for Deep Semi-Supervised Learning with Linguistically Motivated Sequence Labeling Task Hierarchies

Figure 2 for Deep Semi-Supervised Learning with Linguistically Motivated Sequence Labeling Task Hierarchies

Figure 3 for Deep Semi-Supervised Learning with Linguistically Motivated Sequence Labeling Task Hierarchies

Figure 4 for Deep Semi-Supervised Learning with Linguistically Motivated Sequence Labeling Task Hierarchies

In this paper we present a novel Neural Network algorithm for conducting semi-supervised learning for sequence labeling tasks arranged in a linguistically motivated hierarchy. This relationship is exploited to regularise the representations of supervised tasks by backpropagating the error of the unsupervised task through the supervised tasks. We introduce a neural network where lower layers are supervised by junior downstream tasks and the final layer task is an auxiliary unsupervised task. The architecture shows improvements of up to two percentage points F1 for Chunking compared to a plausible baseline.

Via

Access Paper or Ask Questions

Learning to Reason With Adaptive Computation

Nov 10, 2016
Mark Neumann, Pontus Stenetorp, Sebastian Riedel

Figure 1 for Learning to Reason With Adaptive Computation

Figure 2 for Learning to Reason With Adaptive Computation

Figure 3 for Learning to Reason With Adaptive Computation

Figure 4 for Learning to Reason With Adaptive Computation

Multi-hop inference is necessary for machine learning systems to successfully solve tasks such as Recognising Textual Entailment and Machine Reading. In this work, we demonstrate the effectiveness of adaptive computation for learning the number of inference steps required for examples of different complexity and that learning the correct number of inference steps is difficult. We introduce the first model involving Adaptive Computation Time which provides a small performance benefit on top of a similar model without an adaptive component as well as enabling considerable insight into the reasoning process of the model.

* Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

Via

Access Paper or Ask Questions