Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Adversarial Training Methods for Semi-Supervised Text Classification

May 06, 2017
Takeru Miyato, Andrew M. Dai, Ian Goodfellow

Adversarial training provides a means of regularizing supervised learning algorithms while virtual adversarial training is able to extend supervised learning algorithms to the semi-supervised setting. However, both methods require making small perturbations to numerous entries of the input vector, which is inappropriate for sparse high-dimensional inputs such as one-hot word representations. We extend adversarial and virtual adversarial training to the text domain by applying perturbations to the word embeddings in a recurrent neural network rather than to the original input itself. The proposed method achieves state of the art results on multiple benchmark semi-supervised and purely supervised tasks. We provide visualizations and analysis showing that the learned word embeddings have improved in quality and that while training, the model is less prone to overfitting.

* Published as a conference paper at ICLR 2017 

  Access Paper or Ask Questions

Rationale-Augmented Convolutional Neural Networks for Text Classification

Sep 24, 2016
Ye Zhang, Iain Marshall, Byron C. Wallace

We present a new Convolutional Neural Network (CNN) model for text classification that jointly exploits labels on documents and their component sentences. Specifically, we consider scenarios in which annotators explicitly mark sentences (or snippets) that support their overall document categorization, i.e., they provide rationales. Our model exploits such supervision via a hierarchical approach in which each document is represented by a linear combination of the vector representations of its component sentences. We propose a sentence-level convolutional model that estimates the probability that a given sentence is a rationale, and we then scale the contribution of each sentence to the aggregate document representation in proportion to these estimates. Experiments on five classification datasets that have document labels and associated rationales demonstrate that our approach consistently outperforms strong baselines. Moreover, our model naturally provides explanations for its predictions.

  Access Paper or Ask Questions

Simple Large-scale Relation Extraction from Unstructured Text

Mar 24, 2018
Christos Christodoulopoulos, Arpit Mittal

Knowledge-based question answering relies on the availability of facts, the majority of which cannot be found in structured sources (e.g. Wikipedia info-boxes, Wikidata). One of the major components of extracting facts from unstructured text is Relation Extraction (RE). In this paper we propose a novel method for creating distant (weak) supervision labels for training a large-scale RE system. We also provide new evidence about the effectiveness of neural network approaches by decoupling the model architecture from the feature design of a state-of-the-art neural network system. Surprisingly, a much simpler classifier trained on similar features performs on par with the highly complex neural network system (at 75x reduction to the training time), suggesting that the features are a bigger contributor to the final performance.

* To be published in LREC 2018 

  Access Paper or Ask Questions

Latent Tree Decomposition Parsers for AMR-to-Text Generation

Sep 01, 2021
Lisa Jin, Daniel Gildea

Graph encoders in AMR-to-text generation models often rely on neighborhood convolutions or global vertex attention. While these approaches apply to general graphs, AMRs may be amenable to encoders that target their tree-like structure. By clustering edges into a hierarchy, a tree decomposition summarizes graph structure. Our model encodes a derivation forest of tree decompositions and extracts an expected tree. From tree node embeddings, it builds graph edge features used in vertex attention of the graph encoder. Encoding TD forests instead of shortest-pairwise paths in a self-attentive baseline raises BLEU by 0.7 and chrF++ by 0.3. The forest encoder also surpasses a convolutional baseline for molecular property prediction by 1.92% ROC-AUC.

* 9 pages 

  Access Paper or Ask Questions

BHAAV- A Text Corpus for Emotion Analysis from Hindi Stories

Oct 09, 2019
Yaman Kumar, Debanjan Mahata, Sagar Aggarwal, Anmol Chugh, Rajat Maheshwari, Rajiv Ratn Shah

In this paper, we introduce the first and largest Hindi text corpus, named BHAAV, which means emotions in Hindi, for analyzing emotions that a writer expresses through his characters in a story, as perceived by a narrator/reader. The corpus consists of 20,304 sentences collected from 230 different short stories spanning across 18 genres such as Inspirational and Mystery. Each sentence has been annotated into one of the five emotion categories - anger, joy, suspense, sad, and neutral, by three native Hindi speakers with at least ten years of formal education in Hindi. We also discuss challenges in the annotation of low resource languages such as Hindi, and discuss the scope of the proposed corpus along with its possible uses. We also provide a detailed analysis of the dataset and train strong baseline classifiers reporting their performances.

  Access Paper or Ask Questions

Query-based Attention CNN for Text Similarity Map

Oct 18, 2017
Tzu-Chien Liu, Yu-Hsueh Wu, Hung-Yi Lee

In this paper, we introduce Query-based Attention CNN(QACNN) for Text Similarity Map, an end-to-end neural network for question answering. This network is composed of compare mechanism, two-staged CNN architecture with attention mechanism, and a prediction layer. First, the compare mechanism compares between the given passage, query, and multiple answer choices to build similarity maps. Then, the two-staged CNN architecture extracts features through word-level and sentence-level. At the same time, attention mechanism helps CNN focus more on the important part of the passage based on the query information. Finally, the prediction layer find out the most possible answer choice. We conduct this model on the MovieQA dataset using Plot Synopses only, and achieve 79.99% accuracy which is the state of the art on the dataset.

  Access Paper or Ask Questions

Text Transformations in Contrastive Self-Supervised Learning: A Review

Mar 22, 2022
Amrita Bhattacharjee, Mansooreh Karami, Huan Liu

Contrastive self-supervised learning has become a prominent technique in representation learning. The main step in these methods is to contrast semantically similar and dissimilar pairs of samples. However, in the domain of Natural Language, the augmentation methods used in creating similar pairs with regard to contrastive learning assumptions are challenging. This is because, even simply modifying a word in the input might change the semantic meaning of the sentence, and hence, would violate the distributional hypothesis. In this review paper, we formalize the contrastive learning framework in the domain of natural language processing. We emphasize the considerations that need to be addressed in the data transformation step and review the state-of-the-art methods and evaluations for contrastive representation learning in NLP. Finally, we describe some challenges and potential directions for learning better text representations using contrastive methods.

* under review at IJCAI'22 Survey Track 

  Access Paper or Ask Questions

Geocoding Without Geotags: A Text-based Approach for reddit

Oct 07, 2018
Keith Harrigian

In this paper, we introduce the first geolocation inference approach for reddit, a social media platform where user pseudonymity has thus far made supervised demographic inference difficult to implement and validate. In particular, we design a text-based heuristic schema to generate ground truth location labels for reddit users in the absence of explicitly geotagged data. After evaluating the accuracy of our labeling procedure, we train and test several geolocation inference models across our reddit data set and three benchmark Twitter geolocation data sets. Ultimately, we show that geolocation models trained and applied on the same domain substantially outperform models attempting to transfer training data across domains, even more so on reddit where platform-specific interest-group metadata can be used to improve inferences.

* Accepted to the EMNLP Workshop on Noisy User-generated Text (W-NUT). Brussels, Belgium. November 1, 2018 

  Access Paper or Ask Questions

Word-Level Alignment of Paper Documents with their Electronic Full-Text Counterparts

Apr 30, 2021
Mark-Christoph Müller, Sucheta Ghosh, Ulrike Wittig, Maja Rey

We describe a simple procedure for the automatic creation of word-level alignments between printed documents and their respective full-text versions. The procedure is unsupervised, uses standard, off-the-shelf components only, and reaches an F-score of 85.01 in the basic setup and up to 86.63 when using pre- and post-processing. Potential areas of application are manual database curation (incl. document triage) and biomedical expression OCR.

* to appear in Proceedings of BioNLP 2021 

  Access Paper or Ask Questions