Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Text similarity analysis for evaluation of descriptive answers

May 06, 2021
Vedant Bahel, Achamma Thomas

Keeping in mind the necessity of intelligent system in educational sector, this paper proposes a text analysis based automated approach for automatic evaluation of the descriptive answers in an examination. In particular, the research focuses on the use of intelligent concepts of Natural Language Processing and Data Mining for computer aided examination evaluation system. The paper present an architecture for fair evaluation of answer sheet. In this architecture, the examiner creates a sample answer sheet for given sets of question. By using the concept of text summarization, text semantics and keywords summarization, the final score for each answer is calculated. The text similarity model is based on Siamese Manhattan LSTM (MaLSTM). The results of this research were compared to manually graded assignments and other existing system. This approach was found to be very efficient in order to be implemented in an institution or in an university.

* 7 pages, 4 figures 

  Access Paper or Ask Questions

Style Example-Guided Text Generation using Generative Adversarial Transformers

Mar 02, 2020
Kuo-Hao Zeng, Mohammad Shoeybi, Ming-Yu Liu

We introduce a language generative model framework for generating a styled paragraph based on a context sentence and a style reference example. The framework consists of a style encoder and a texts decoder. The style encoder extracts a style code from the reference example, and the text decoder generates texts based on the style code and the context. We propose a novel objective function to train our framework. We also investigate different network design choices. We conduct extensive experimental validation with comparison to strong baselines to validate the effectiveness of the proposed framework using a newly collected dataset with diverse text styles. Both code and dataset will be released upon publication.

  Access Paper or Ask Questions

Annotating Character Relationships in Literary Texts

Dec 02, 2015
Philip Massey, Patrick Xia, David Bamman, Noah A. Smith

We present a dataset of manually annotated relationships between characters in literary texts, in order to support the training and evaluation of automatic methods for relation type prediction in this domain (Makazhanov et al., 2014; Kokkinakis, 2013) and the broader computational analysis of literary character (Elson et al., 2010; Bamman et al., 2014; Vala et al., 2015; Flekova and Gurevych, 2015). In this work, we solicit annotations from workers on Amazon Mechanical Turk for 109 texts ranging from Homer's _Iliad_ to Joyce's _Ulysses_ on four dimensions of interest: for a given pair of characters, we collect judgments as to the coarse-grained category (professional, social, familial), fine-grained category (friend, lover, parent, rival, employer), and affinity (positive, negative, neutral) that describes their primary relationship in a text. We do not assume that this relationship is static; we also collect judgments as to whether it changes at any point in the course of the text.

  Access Paper or Ask Questions

VML-MOC: Segmenting a multiply oriented and curved handwritten text lines dataset

Jan 19, 2021
Berat Kurar Barakat, Rafi Cohen, Irina Rabaev, Jihad El-Sana

This paper publishes a natural and very complicated dataset of handwritten documents with multiply oriented and curved text lines, namely VML-MOC dataset. These text lines were written as remarks on the page margins by different writers over the years. They appear at different locations within the orientations that range between 0 and 180 or as curvilinear forms. We evaluate a multi-oriented Gaussian based method to segment these handwritten text lines that are skewed or curved in any orientation. It achieves a mean pixel Intersection over Union score of 80.96% on the test documents. The results are compared with the results of a single-oriented Gaussian based text line segmentation method.

  Access Paper or Ask Questions

Arabic Language Text Classification Using Dependency Syntax-Based Feature Selection

Oct 17, 2014
Yannis Haralambous, Yassir Elidrissi, Philippe Lenca

We study the performance of Arabic text classification combining various techniques: (a) tfidf vs. dependency syntax, for feature selection and weighting; (b) class association rules vs. support vector machines, for classification. The Arabic text is used in two forms: rootified and lightly stemmed. The results we obtain show that lightly stemmed text leads to better performance than rootified text; that class association rules are better suited for small feature sets obtained by dependency syntax constraints; and, finally, that support vector machines are better suited for large feature sets based on morphological feature selection criteria.

* 10 pages, 4 figure, accepted at CITALA 2014 (

  Access Paper or Ask Questions

An Unsupervised Cross-Modal Hashing Method Robust to Noisy Training Image-Text Correspondences in Remote Sensing

Feb 26, 2022
Georgii Mikriukov, Mahdyar Ravanbakhsh, Begüm Demir

The development of accurate and scalable cross-modal image-text retrieval methods, where queries from one modality (e.g., text) can be matched to archive entries from another (e.g., remote sensing image) has attracted great attention in remote sensing (RS). Most of the existing methods assume that a reliable multi-modal training set with accurately matched text-image pairs is existing. However, this assumption may not always hold since the multi-modal training sets may include noisy pairs (i.e., textual descriptions/captions associated to training images can be noisy), distorting the learning process of the retrieval methods. To address this problem, we propose a novel unsupervised cross-modal hashing method robust to the noisy image-text correspondences (CHNR). CHNR consists of three modules: 1) feature extraction module, which extracts feature representations of image-text pairs; 2) noise detection module, which detects potential noisy correspondences; and 3) hashing module that generates cross-modal binary hash codes. The proposed CHNR includes two training phases: i) meta-learning phase that uses a small portion of clean (i.e., reliable) data to train the noise detection module in an adversarial fashion; and ii) the main training phase for which the trained noise detection module is used to identify noisy correspondences while the hashing module is trained on the noisy multi-modal training set. Experimental results show that the proposed CHNR outperforms state-of-the-art methods. Our code is publicly available at


  Access Paper or Ask Questions

Towards a Robust Deep Neural Network in Text Domain A Survey

Apr 14, 2019
Wenqi Wang, Lina Wang, Benxiao Tang, Run Wang, Aoshuang Ye

Deep neural networks (DNNs) have shown an inherent vulnerability to adversarial examples which are maliciously crafted on real examples by attackers, aiming at making target DNNs misbehave. The threats of adversarial examples are widely existed in image, voice, speech, and text recognition and classification. Inspired by the previous work, researches on adversarial attacks and defenses in text domain develop rapidly. In order to make people have a general understanding about the field, this article presents a comprehensive review on adversarial examples in text. We analyze the advantages and shortcomings of recent adversarial examples generation methods and elaborate the efficiency and limitations on countermeasures. Finally, we discuss the challenges in adversarial texts and provide a research direction of this aspect.

  Access Paper or Ask Questions

Survey of Text-based Epidemic Intelligence: A Computational Linguistic Perspective

Mar 14, 2019
Aditya Joshi, Sarvnaz Karimi, Ross Sparks, Cecile Paris, C Raina MacIntyre

Epidemic intelligence deals with the detection of disease outbreaks using formal (such as hospital records) and informal sources (such as user-generated text on the web) of information. In this survey, we discuss approaches for epidemic intelligence that use textual datasets, referring to it as `text-based epidemic intelligence'. We view past work in terms of two broad categories: health mention classification (selecting relevant text from a large volume) and health event detection (predicting epidemic events from a collection of relevant text). The focus of our discussion is the underlying computational linguistic techniques in the two categories. The survey also provides details of the state-of-the-art in annotation techniques, resources and evaluation strategies for epidemic intelligence.

* This paper is under review at ACM Computing Surveys. This version of the paper does not use the ACM Computing Surveys stylesheet. This arXiv version is to solicit feedback 

  Access Paper or Ask Questions

An NLP Approach to a Specific Type of Texts: Car Accident Reports

Feb 23, 1995
Dominique Estival, Francoise Gayral

The work reported here is the result of a study done within a larger project on the ``Semantics of Natural Languages'' viewed from the field of Artificial Intelligence and Computational Linguistics. In this project, we have chosen a corpus of insurance claim reports. These texts deal with a relatively circumscribed domain, that of road traffic, thereby limiting the extra-linguistic knowledge necessary to understand them. Moreover, these texts present a number of very specific characteristics, insofar as they are written in a quasi-institutional setting which imposes many constraints on their production. We first determine what these constraints are in order to then show how they provide the writer with the means to create as succint a text as possible, and in a symmetric way, how they provide the reader with the means to interpret the text and to distinguish between its factual and argumentative aspects.

* 20 pages 

  Access Paper or Ask Questions

Scene Text Detection for Augmented Reality -- Character Bigram Approach to reduce False Positive Rate

Dec 26, 2020
Sagar Gubbi, Bharadwaj Amrutur

Natural scene text detection is an important aspect of scene understanding and could be a useful tool in building engaging augmented reality applications. In this work, we address the problem of false positives in text spotting. We propose improving the performace of sliding window text spotters by looking for character pairs (bigrams) rather than single characters. An efficient convolutional neural network is designed and trained to detect bigrams. The proposed detector reduces false positive rate by 28.16% on the ICDAR 2015 dataset. We demonstrate that detecting bigrams is a computationally inexpensive way to improve sliding window text spotters.

  Access Paper or Ask Questions