Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Learning to increase matching efficiency in identifying additional b-jets in the $\text{t}\bar{\text{t}}\text{b}\bar{\text{b}}$ process

Mar 16, 2021
Cheongjae Jang, Sang-Kyun Ko, Yung-Kyun Noh, Jieun Choi, Jongwon Lim, Tae Jeong Kim

The $\text{t}\bar{\text{t}}\text{H}(\text{b}\bar{\text{b}})$ process is an essential channel to reveal the Higgs properties but has an irreducible background from the $\text{t}\bar{\text{t}}\text{b}\bar{\text{b}}$ process, which produces a top quark pair in association with a b quark pair. Therefore, understanding the $\text{t}\bar{\text{t}}\text{b}\bar{\text{b}}$ process is crucial for improving the sensitivity of a search for the $\text{t}\bar{\text{t}}\text{H}(\text{b}\bar{\text{b}})$ process. To this end, when measuring the differential cross-section of the $\text{t}\bar{\text{t}}\text{b}\bar{\text{b}}$ process, we need to distinguish the b-jets originated from top quark decays, and additional b-jets originated from gluon splitting. Since there are no simple identification rules, we adopt deep learning methods to learn from data to identify the additional b-jets from the $\text{t}\bar{\text{t}}\text{b}\bar{\text{b}}$ events. Specifically, by exploiting the special structure of the $\text{t}\bar{\text{t}}\text{b}\bar{\text{b}}$ event data, we propose several loss functions that can be minimized to directly increase the matching efficiency, the accuracy of identifying additional b-jets. We discuss the difference between our method and another deep learning-based approach based on binary classification arXiv:1910.14535 using synthetic data. We then verify that additional b-jets can be identified more accurately by increasing matching efficiency directly rather than the binary classification accuracy, using simulated $\text{t}\bar{\text{t}}\text{b}\bar{\text{b}}$ event data in the lepton+jets channel from pp collision at $\sqrt{s}$ = 13 TeV.

  Access Paper or Ask Questions

Total-Text: A Comprehensive Dataset for Scene Text Detection and Recognition

Oct 28, 2017
Chee Kheng Chng, Chee Seng Chan

Text in curve orientation, despite being one of the common text orientations in real world environment, has close to zero existence in well received scene text datasets such as ICDAR2013 and MSRA-TD500. The main motivation of Total-Text is to fill this gap and facilitate a new research direction for the scene text community. On top of the conventional horizontal and multi-oriented texts, it features curved-oriented text. Total-Text is highly diversified in orientations, more than half of its images have a combination of more than two orientations. Recently, a new breed of solutions that casted text detection as a segmentation problem has demonstrated their effectiveness against multi-oriented text. In order to evaluate its robustness against curved text, we fine-tuned DeconvNet and benchmark it on Total-Text. Total-Text with its annotation is available at

* Accepted as Oral presentation in ICDAR2017 (Extended version, 13 pages 17 figures). We introduce a new scene text dataset namely as Total-Text, which is more comprehensive than the existing scene text datasets as it consists of 1555 natural images with more than 3 different text orientations, one of a kind 

  Access Paper or Ask Questions

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption

Dec 08, 2020
Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Florencio, Lijuan Wang, Cha Zhang, Lei Zhang, Jiebo Luo

In this paper, we propose Text-Aware Pre-training (TAP) for Text-VQA and Text-Caption tasks. These two tasks aim at reading and understanding scene text in images for question answering and image caption generation, respectively. In contrast to the conventional vision-language pre-training that fails to capture scene text and its relationship with the visual and text modalities, TAP explicitly incorporates scene text (generated from OCR engines) in pre-training. With three pre-training tasks, including masked language modeling (MLM), image-text (contrastive) matching (ITM), and relative (spatial) position prediction (RPP), TAP effectively helps the model learn a better aligned representation among the three modalities: text word, visual object, and scene text. Due to this aligned representation learning, even pre-trained on the same downstream task dataset, TAP already boosts the absolute accuracy on the TextVQA dataset by +5.4%, compared with a non-TAP baseline. To further improve the performance, we build a large-scale dataset based on the Conceptual Caption dataset, named OCR-CC, which contains 1.4 million scene text-related image-text pairs. Pre-trained on this OCR-CC dataset, our approach outperforms the state of the art by large margins on multiple tasks, i.e., +8.3% accuracy on TextVQA, +8.6% accuracy on ST-VQA, and +10.2 CIDEr score on TextCaps.

  Access Paper or Ask Questions

A Semantic Relevance Based Neural Network for Text Summarization and Text Simplification

Oct 06, 2017
Shuming Ma, Xu Sun

Text summarization and text simplification are two major ways to simplify the text for poor readers, including children, non-native speakers, and the functionally illiterate. Text summarization is to produce a brief summary of the main ideas of the text, while text simplification aims to reduce the linguistic complexity of the text and retain the original meaning. Recently, most approaches for text summarization and text simplification are based on the sequence-to-sequence model, which achieves much success in many text generation tasks. However, although the generated simplified texts are similar to source texts literally, they have low semantic relevance. In this work, our goal is to improve semantic relevance between source texts and simplified texts for text summarization and text simplification. We introduce a Semantic Relevance Based neural model to encourage high semantic similarity between texts and summaries. In our model, the source text is represented by a gated attention encoder, while the summary representation is produced by a decoder. Besides, the similarity score between the representations is maximized during training. Our experiments show that the proposed model outperforms the state-of-the-art systems on two benchmark corpus.

  Access Paper or Ask Questions

SA-Text: Simple but Accurate Detector for Text of Arbitrary Shapes

Dec 03, 2019
Qitong Wang, Yi Zheng, Margrit Betke

We introduce a new framework for text detection named SA-Text meaning "Simple but Accurate," which utilizes heatmaps to detect text regions in natural scene images effectively. SA-Text detects text that occurs in various fonts, shapes, and orientations in natural scene images with complicated backgrounds. Experiments on three challenging and public scene-text-detection datasets, Total-Text, SCUT-CTW1500, and MSRA-TD500 show the effectiveness and generalization ability of SA-Text in detecting not only multi-lingual oriented straight but also curved text in scripts of multiple languages. To further show the practicality of SA-Text, we combine it with a powerful state-of-the-art text recognition model and thus propose a pipeline-based text spotting system called SAA ("text spotting" is used as the technical term for "detection and recognition of text"). Our experimental results of SAA on the Total-Text dataset show that SAA outperforms four state-of-the-art text spotting frameworks by at least 9 percent points in the F-measure, which means that SA-Text can be used as a complete text detection and recognition system in real applications.

  Access Paper or Ask Questions

CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

Dec 14, 2021
Jingyang Lin, Yingwei Pan, Rongfeng Lai, Xuehang Yang, Hongyang Chao, Ting Yao

Localizing text instances in natural scenes is regarded as a fundamental challenge in computer vision. Nevertheless, owing to the extremely varied aspect ratios and scales of text instances in real scenes, most conventional text detectors suffer from the sub-text problem that only localizes the fragments of text instance (i.e., sub-texts). In this work, we quantitatively analyze the sub-text problem and present a simple yet effective design, COntrastive RElation (CORE) module, to mitigate that issue. CORE first leverages a vanilla relation block to model the relations among all text proposals (sub-texts of multiple text instances) and further enhances relational reasoning via instance-level sub-text discrimination in a contrastive manner. Such way naturally learns instance-aware representations of text proposals and thus facilitates scene text detection. We integrate the CORE module into a two-stage text detector of Mask R-CNN and devise our text detector CORE-Text. Extensive experiments on four benchmarks demonstrate the superiority of CORE-Text. Code is available: \url{}.

* ICME 2021 (Oral); Code is publicly available at: 

  Access Paper or Ask Questions

Text Guide: Improving the quality of long text classification by a text selection method based on feature importance

Apr 15, 2021
Krzysztof Fiok, Waldemar Karwowski, Edgar Gutierrez, Mohammad Reza Davahli, Maciej Wilamowski, Tareq Ahram, Awad Al-Juaid, Jozef Zurada

The performance of text classification methods has improved greatly over the last decade for text instances of less than 512 tokens. This limit has been adopted by most state-of-the-research transformer models due to the high computational cost of analyzing longer text instances. To mitigate this problem and to improve classification for longer texts, researchers have sought to resolve the underlying causes of the computational cost and have proposed optimizations for the attention mechanism, which is the key element of every transformer model. In our study, we are not pursuing the ultimate goal of long text classification, i.e., the ability to analyze entire text instances at one time while preserving high performance at a reasonable computational cost. Instead, we propose a text truncation method called Text Guide, in which the original text length is reduced to a predefined limit in a manner that improves performance over naive and semi-naive approaches while preserving low computational costs. Text Guide benefits from the concept of feature importance, a notion from the explainable artificial intelligence domain. We demonstrate that Text Guide can be used to improve the performance of recent language models specifically designed for long text classification, such as Longformer. Moreover, we discovered that parameter optimization is the key to Text Guide performance and must be conducted before the method is deployed. Future experiments may reveal additional benefits provided by this new method.

  Access Paper or Ask Questions

Extracting Information-rich Part of Texts using Text Denoising

Jul 30, 2013
Rushdi Shams

The aim of this paper is to report on a novel text reduction technique, called Text Denoising, that highlights information-rich content when processing a large volume of text data, especially from the biomedical domain. The core feature of the technique, the text readability index, embodies the hypothesis that complex text is more information-rich than the rest. When applied on tasks like biomedical relation bearing text extraction, keyphrase indexing and extracting sentences describing protein interactions, it is evident that the reduced set of text produced by text denoising is more information-rich than the rest.

* 26th Canadian Conference on Artificial Intelligence (CAI-2013), Regina, Canada, May 29-31, 2013 

  Access Paper or Ask Questions

Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation

Jul 27, 2021
Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Yi-Zhe Song

Text recognition remains a fundamental and extensively researched topic in computer vision, largely owing to its wide array of commercial applications. The challenging nature of the very problem however dictated a fragmentation of research efforts: Scene Text Recognition (STR) that deals with text in everyday scenes, and Handwriting Text Recognition (HTR) that tackles hand-written text. In this paper, for the first time, we argue for their unification -- we aim for a single model that can compete favourably with two separate state-of-the-art STR and HTR models. We first show that cross-utilisation of STR and HTR models trigger significant performance drops due to differences in their inherent challenges. We then tackle their union by introducing a knowledge distillation (KD) based framework. This is however non-trivial, largely due to the variable-length and sequential nature of text sequences, which renders off-the-shelf KD techniques that mostly works with global fixed-length data inadequate. For that, we propose three distillation losses all of which are specifically designed to cope with the aforementioned unique characteristics of text recognition. Empirical evidence suggests that our proposed unified model performs on par with individual models, even surpassing them in certain cases. Ablative studies demonstrate that naive baselines such as a two-stage framework, and domain adaption/generalisation alternatives do not work as well, further verifying the appropriateness of our design.

* IEEE International Conference on Computer Vision (ICCV), 2021 

  Access Paper or Ask Questions