Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

ExpBERT: Representation Engineering with Natural Language Explanations

May 05, 2020
Shikhar Murty, Pang Wei Koh, Percy Liang

Suppose we want to specify the inductive bias that married couples typically go on honeymoons for the task of extracting pairs of spouses from text. In this paper, we allow model developers to specify these types of inductive biases as natural language explanations. We use BERT fine-tuned on MultiNLI to ``interpret'' these explanations with respect to the input sentence, producing explanation-guided representations of the input. Across three relation extraction tasks, our method, ExpBERT, matches a BERT baseline but with 3--20x less labeled data and improves on the baseline by 3--10 F1 points with the same amount of labeled data.

* ACL 2020 

  Access Paper or Ask Questions

Revisiting Unsupervised Relation Extraction

Apr 30, 2020
Thy Thy Tran, Phong Le, Sophia Ananiadou

Unsupervised relation extraction (URE) extracts relations between named entities from raw text without manually-labelled data and existing knowledge bases (KBs). URE methods can be categorised into generative and discriminative approaches, which rely either on hand-crafted features or surface form. However, we demonstrate that by using only named entities to induce relation types, we can outperform existing methods on two popular datasets. We conduct a comparison and evaluation of our findings with other URE techniques, to ascertain the important features in URE. We conclude that entity types provide a strong inductive bias for URE.

* 8 pages, 1 figure, 2 tables. Accepted in ACL 2020 

  Access Paper or Ask Questions

Multi-Step Inference for Reasoning Over Paragraphs

Apr 06, 2020
Jiangming Liu, Matt Gardner

Complex reasoning over text requires understanding and chaining together free-form predicates and logical connectives. Prior work has largely tried to do this either symbolically or with black-box transformers. We present a middle ground between these two extremes: a compositional model reminiscent of neural module networks that can perform chained logical reasoning. This model first finds relevant sentences in the context and then chains them together using neural modules. Our model gives significant performance improvements (up to 29\% relative error reduction when combined with a reranker) on ROPES, a recently-introduced complex reasoning dataset

  Access Paper or Ask Questions

Exploiting Temporal Coherence for Multi-modal Video Categorization

Feb 07, 2020
Palash Goyal, Saurabh Sahu, Shalini Ghosh, Chul Lee

Multimodal ML models can process data in multiple modalities (e.g., video, images, audio, text) and are useful for video content analysis in a variety of problems (e.g., object detection, scene understanding). In this paper, we focus on the problem of video categorization by using a multimodal approach. We have developed a novel temporal coherence-based regularization approach, which applies to different types of models (e.g., RNN, NetVLAD, Transformer). We demonstrate through experiments how our proposed multimodal video categorization models with temporal coherence out-perform strong state-of-the-art baseline models.

  Access Paper or Ask Questions

Review of Probability Distributions for Modeling Count Data

Jan 10, 2020
F. William Townes

Count data take on non-negative integer values and are challenging to properly analyze using standard linear-Gaussian methods such as linear regression and principal components analysis. Generalized linear models enable direct modeling of counts in a regression context using distributions such as the Poisson and negative binomial. When counts contain only relative information, multinomial or Dirichlet-multinomial models can be more appropriate. We review some of the fundamental connections between multinomial and count models from probability theory, providing detailed proofs. These relationships are useful for methods development in applications such as topic modeling of text data and genomics.

  Access Paper or Ask Questions

Neural Machine Translation: A Review

Dec 04, 2019
Felix Stahlberg

The field of machine translation (MT), the automatic translation of written text from one natural language into another, has experienced a major paradigm shift in recent years. Statistical MT, which mainly relies on various count-based models and which used to dominate MT research for decades, has largely been superseded by neural machine translation (NMT), which tackles translation with a single neural network. In this work we will trace back the origins of modern NMT architectures to word and sentence embeddings and earlier examples of the encoder-decoder network family. We will conclude with a survey of recent trends in the field.

  Access Paper or Ask Questions

Structured Sparsification of Gated Recurrent Neural Networks

Nov 13, 2019
Ekaterina Lobacheva, Nadezhda Chirkova, Alexander Markovich, Dmitry Vetrov

Recently, a lot of techniques were developed to sparsify the weights of neural networks and to remove networks' structure units, e.g. neurons. We adjust the existing sparsification approaches to the gated recurrent architectures. Specifically, in addition to the sparsification of weights and neurons, we propose sparsifying the preactivations of gates. This makes some gates constant and simplifies LSTM structure. We test our approach on the text classification and language modeling tasks. We observe that the resulting structure of gate sparsity depends on the task and connect the learned structure to the specifics of the particular tasks. Our method also improves neuron-wise compression of the model in most of the tasks.

* Published in Workshop on Context and Compositionality in Biological and Artificial Neural Systems, NeurIPS 2019 

  Access Paper or Ask Questions

Small and Practical BERT Models for Sequence Labeling

Aug 31, 2019
Henry Tsai, Jason Riesa, Melvin Johnson, Naveen Arivazhagan, Xin Li, Amelia Archer

We propose a practical scheme to train a single multilingual sequence labeling model that yields state of the art results and is small and fast enough to run on a single CPU. Starting from a public multilingual BERT checkpoint, our final model is 6x smaller and 27x faster, and has higher accuracy than a state-of-the-art multilingual baseline. We show that our model especially outperforms on low-resource languages, and works on codemixed input text without being explicitly trained on codemixed examples. We showcase the effectiveness of our method by reporting on part-of-speech tagging and morphological prediction on 70 treebanks and 48 languages.

* 11 pages including appendices; accepted to appear at EMNLP-IJCNLP 2019 

  Access Paper or Ask Questions

GLOSS: Generative Latent Optimization of Sentence Representations

Jul 15, 2019
Sidak Pal Singh, Angela Fan, Michael Auli

We propose a method to learn unsupervised sentence representations in a non-compositional manner based on Generative Latent Optimization. Our approach does not impose any assumptions on how words are to be combined into a sentence representation. We discuss a simple Bag of Words model as well as a variant that models word positions. Both are trained to reconstruct the sentence based on a latent code and our model can be used to generate text. Experiments show large improvements over the related Paragraph Vectors. Compared to uSIF, we achieve a relative improvement of 5% when trained on the same data and our method performs competitively to Sent2vec while trained on 30 times less data.

  Access Paper or Ask Questions

Myers-Briggs Personality Classification and Personality-Specific Language Generation Using Pre-trained Language Models

Jul 15, 2019
Sedrick Scott Keh, I-Tsun Cheng

The Myers-Briggs Type Indicator (MBTI) is a popular personality metric that uses four dichotomies as indicators of personality traits. This paper examines the use of pre-trained language models to predict MBTI personality types based on scraped labeled texts. The proposed model reaches an accuracy of $0.47$ for correctly predicting all 4 types and $0.86$ for correctly predicting at least 2 types. Furthermore, we investigate the possible uses of a fine-tuned BERT model for personality-specific language generation. This is a task essential for both modern psychology and for intelligent empathetic systems.

  Access Paper or Ask Questions