Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Text Revision by On-the-Fly Representation Optimization

Apr 15, 2022
Jingjing Li, Zichao Li, Tao Ge, Irwin King, Michael R. Lyu

Text revision refers to a family of natural language generation tasks, where the source and target sequences share moderate resemblance in surface form but differentiate in attributes, such as text formality and simplicity. Current state-of-the-art methods formulate these tasks as sequence-to-sequence learning problems, which rely on large-scale parallel training corpus. In this paper, we present an iterative in-place editing approach for text revision, which requires no parallel data. In this approach, we simply fine-tune a pre-trained Transformer with masked language modeling and attribute classification. During inference, the editing at each iteration is realized by two-step span replacement. At the first step, the distributed representation of the text optimizes on the fly towards an attribute function. At the second step, a text span is masked and another new one is proposed conditioned on the optimized representation. The empirical experiments on two typical and important text revision tasks, text formalization and text simplification, show the effectiveness of our approach. It achieves competitive and even better performance than state-of-the-art supervised methods on text simplification, and gains better performance than strong unsupervised methods on text formalization \footnote{Code and model are available at \url{}}.

* AAAI 2022 

  Access Paper or Ask Questions

End-To-End Measure for Text Recognition

Aug 26, 2019
Gundram Leifert, Roger Labahn, Tobias Grüning, Svenja Leifert

Measuring the performance of text recognition and text line detection engines is an important step to objectively compare systems and their configuration. There exist well-established measures for both tasks separately. However, there is no sophisticated evaluation scheme to measure the quality of a combined text line detection and text recognition system. The F-measure on word level is a well-known methodology, which is sometimes used in this context. Nevertheless, it does not take into account the alignment of hypothesis and ground truth text and can lead to deceptive results. Since users of automatic information retrieval pipelines in the context of text recognition are mainly interested in the end-to-end performance of a given system, there is a strong need for such a measure. Hence, we present a measure to evaluate the quality of an end-to-end text recognition system. The basis for this measure is the well established and widely used character error rate, which is limited -- in its original form -- to aligned hypothesis and ground truth texts. The proposed measure is flexible in a way that it can be configured to penalize different reading orders between the hypothesis and ground truth and can take into account the geometric position of the text lines. Additionally, it can ignore over- and under- segmentation of text lines. With these parameters it is possible to get a measure fitting best to its own needs.

* to appear in proceeding at ICDAR 2019 

  Access Paper or Ask Questions

Simultaneous Recognition of Horizontal and Vertical Text in Natural Images

Dec 06, 2018
Chankyu Choi, Youngmin Yoon, Junsu Lee, Junseok Kim

Recent state-of-the-art scene text recognition methods have primarily focused on horizontal text in images. However, in several Asian countries, including China, large amounts of text in signs, books, and TV commercials are vertically directed. Because the horizontal and vertical texts exhibit different characteristics, developing an algorithm that can simultaneously recognize both types of text in real environments is necessary. To address this problem, we adopted the direction encoding mask (DEM) and selective attention network (SAN) methods based on supervised learning. DEM contains directional information to compensate in cases that lack text direction; therefore, our network is trained using this information to handle the vertical text. The SAN method is designed to work individually for both types of text. To train the network to recognize both types of text and to evaluate the effectiveness of the designed model, we prepared a new synthetic vertical text dataset and collected an actual vertical text dataset (VTD142) from the Web. Using these datasets, we proved that our proposed model can accurately recognize both vertical and horizontal text and can achieve state-of-the-art results in experiments using benchmark datasets, including the street view test (SVT), IIIT-5k, and ICDAR. Although our model is relatively simple as compared to its predecessors, it maintains the accuracy and is trained in an end-to-end manner.

* Presented at the IWRR workshop at ACCV 2018 

  Access Paper or Ask Questions

Label or Message: A Large-Scale Experimental Survey of Texts and Objects Co-Occurrence

Jul 30, 2020
Koki Takeshita, Juntaro Shioyama, Seiichi Uchida

Our daily life is surrounded by textual information. Nowadays, the automatic collection of textual information becomes possible owing to the drastic improvement of scene text detectors and recognizer. The purpose of this paper is to conduct a large-scale survey of co-occurrence between visual objects (such as book and car) and scene texts with a large image dataset and a state-of-the-art scene text detector and recognizer. Especially, we focus on the function of "label" texts, which are attached to objects for detailing the objects. By analyzing co-occurrence between objects and scene texts, it is possible to observe the statistics about the label texts and understand how the scene texts will be useful for recognizing the objects and vice versa.

* Accepted at ICPR2020 

  Access Paper or Ask Questions

TextBoxes++: A Single-Shot Oriented Scene Text Detector

Apr 27, 2018
Minghui Liao, Baoguang Shi, Xiang Bai

Scene text detection is an important step of scene text recognition system and also a challenging problem. Different from general object detection, the main challenges of scene text detection lie on arbitrary orientations, small sizes, and significantly variant aspect ratios of text in natural images. In this paper, we present an end-to-end trainable fast scene text detector, named TextBoxes++, which detects arbitrary-oriented scene text with both high accuracy and efficiency in a single network forward pass. No post-processing other than an efficient non-maximum suppression is involved. We have evaluated the proposed TextBoxes++ on four public datasets. In all experiments, TextBoxes++ outperforms competing methods in terms of text localization accuracy and runtime. More specifically, TextBoxes++ achieves an f-measure of 0.817 at 11.6fps for 1024*1024 ICDAR 2015 Incidental text images, and an f-measure of 0.5591 at 19.8fps for 768*768 COCO-Text images. Furthermore, combined with a text recognizer, TextBoxes++ significantly outperforms the state-of-the-art approaches for word spotting and end-to-end text recognition tasks on popular benchmarks. Code is available at:

* IEEE Transactions on Image Processing 27 (2018) 3676-3690 
* 15 pages 

  Access Paper or Ask Questions

Normalizing Text using Language Modelling based on Phonetics and String Similarity

Jun 25, 2020
Fenil Doshi, Jimit Gandhi, Deep Gosalia, Sudhir Bagul

Social media networks and chatting platforms often use an informal version of natural text. Adversarial spelling attacks also tend to alter the input text by modifying the characters in the text. Normalizing these texts is an essential step for various applications like language translation and text to speech synthesis where the models are trained over clean regular English language. We propose a new robust model to perform text normalization. Our system uses the BERT language model to predict the masked words that correspond to the unnormalized words. We propose two unique masking strategies that try to replace the unnormalized words in the text with their root form using a unique score based on phonetic and string similarity metrics.We use human-centric evaluations where volunteers were asked to rank the normalized text. Our strategies yield an accuracy of 86.7% and 83.2% which indicates the effectiveness of our system in dealing with text normalization.

* Author 1, 2 and 3 are equal contributors; Number of pages: 9; Number of figures: 3 

  Access Paper or Ask Questions

WikiGraphs: A Wikipedia Text - Knowledge Graph Paired Dataset

Jul 20, 2021
Luyu Wang, Yujia Li, Ozlem Aslan, Oriol Vinyals

We present a new dataset of Wikipedia articles each paired with a knowledge graph, to facilitate the research in conditional text generation, graph generation and graph representation learning. Existing graph-text paired datasets typically contain small graphs and short text (1 or few sentences), thus limiting the capabilities of the models that can be learned on the data. Our new dataset WikiGraphs is collected by pairing each Wikipedia article from the established WikiText-103 benchmark (Merity et al., 2016) with a subgraph from the Freebase knowledge graph (Bollacker et al., 2008). This makes it easy to benchmark against other state-of-the-art text generative models that are capable of generating long paragraphs of coherent text. Both the graphs and the text data are of significantly larger scale compared to prior graph-text paired datasets. We present baseline graph neural network and transformer model results on our dataset for 3 tasks: graph -> text generation, graph -> text retrieval and text -> graph retrieval. We show that better conditioning on the graph provides gains in generation and retrieval quality but there is still large room for improvement.

  Access Paper or Ask Questions

Visual Semantic Re-ranker for Text Spotting

Oct 27, 2018
Ahmed Sabir, Francesc Moreno-Noguer, Lluís Padró

Many current state-of-the-art methods for text recognition are based on purely local information and ignore the semantic correlation between text and its surrounding visual context. In this paper, we propose a post-processing approach to improve the accuracy of text spotting by using the semantic relation between the text and the scene. We initially rely on an off-the-shelf deep neural network that provides a series of text hypotheses for each input image. These text hypotheses are then re-ranked using the semantic relatedness with the object in the image. As a result of this combination, the performance of the original network is boosted with a very low computational cost. The proposed framework can be used as a drop-in complement for any text-spotting algorithm that outputs a ranking of word hypotheses. We validate our approach on ICDAR'17 shared task dataset.

  Access Paper or Ask Questions

Investigating Pretrained Language Models for Graph-to-Text Generation

Jul 16, 2020
Leonardo F. R. Ribeiro, Martin Schmitt, Hinrich Schütze, Iryna Gurevych

Graph-to-text generation, a subtask of data-to-text generation, aims to generate fluent texts from graph-based data. Many graph-to-text models have shown strong performance in this task employing specialized graph encoders. However, recent approaches employ large pretrained language models (PLMs) achieving state-of-the-art results in data-to-text generation. In this paper, we aim to investigate the impact of large PLMs in graph-to-text generation. We present a study across three graph domains: meaning representations, Wikipedia knowledge graphs (KGs) and scientific KGs. Our analysis shows that PLMs such as BART and T5 achieve state-of-the-art results in graph-to-text benchmarks without explicitly encoding the graph structure. We also demonstrate that task-adaptive pretraining strategies are beneficial to the target task, improving even further the state of the art in two benchmarks for graph-to-text generation. In a final analysis, we investigate possible reasons for the PLMs' success on graph-to-text tasks. We find evidence that their knowledge about the world gives them a big advantage, especially when generating texts from KGs.

  Access Paper or Ask Questions

Fast Clustering of Short Text Streams Using Efficient Cluster Indexing and Dynamic Similarity Thresholds

Jan 21, 2021
Md Rashadul Hasan Rakib, Muhammad Asaduzzaman

Short text stream clustering is an important but challenging task since massive amount of text is generated from different sources such as micro-blogging, question-answering, and social news aggregation websites. One of the major challenges of clustering such massive amount of text is to cluster them within a reasonable amount of time. The existing state-of-the-art short text stream clustering methods can not cluster such massive amount of text within a reasonable amount of time as they compute similarities between a text and all the existing clusters to assign that text to a cluster. To overcome this challenge, we propose a fast short text stream clustering method (called FastStream) that efficiently index the clusters using inverted index and compute similarity between a text and a selected number of clusters while assigning a text to a cluster. In this way, we not only reduce the running time of our proposed method but also reduce the running time of several state-of-the-art short text stream clustering methods. FastStream assigns a text to a cluster (new or existing) using the dynamically computed similarity thresholds based on statistical measure. Thus our method efficiently deals with the concept drift problem. Experimental results demonstrate that FastStream outperforms the state-of-the-art short text stream clustering methods by a significant margin on several short text datasets. In addition, the running time of FastStream is several orders of magnitude faster than that of the state-of-the-art methods.

* 6 

  Access Paper or Ask Questions