Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Methods of Weighted Combination for Text Field Recognition in a Video Stream

Nov 27, 2019
Olga Petrova, Konstantin Bulatov, Vladimir L. Arlazarov

Due to a noticeable expansion of document recognition applicability, there is a high demand for recognition on mobile devices. A mobile camera, unlike a scanner, cannot always ensure the absence of various image distortions, therefore the task of improving the recognition precision is relevant. The advantage of mobile devices over scanners is the ability to use video stream input, which allows to get multiple images of a recognized document. Despite this, not enough attention is currently paid to the issue of combining recognition results obtained from different frames when using video stream input. In this paper we propose a weighted text string recognition results combination method and weighting criteria, and provide experimental data for verifying their validity and effectiveness. Based on the obtained results, it is concluded that the use of such weighted combination is appropriate for improving the quality of the video stream recognition result.

* 6 pages, 4 figures, 1 table, accepted and presented at International Conference on Machine Vision 2019 (ICMV 2019) 

  Access Paper or Ask Questions

Reading Stockholm Riots 2013 in social media by text-mining

Oct 04, 2013
Andrzej Jarynowski, Amir Rostami

The riots in Stockholm in May 2013 were an event that reverberated in the world media for its dimension of violence that had spread through the Swedish capital. In this study we have investigated the role of social media in creating media phenomena via text mining and natural language processing. We have focused on two channels of communication for our analysis: Twitter and (Forum of Polish community in Sweden). Our preliminary results show some hot topics driving discussion related mostly to Swedish Police and Swedish Politics by counting word usage. Typical features for media intervention are presented. We have built networks of most popular phrases, clustered by categories (geography, media institution, etc.). Sentiment analysis shows negative connotation with Police. The aim of this preliminary exploratory quantitative study was to generate questions and hypotheses, which we could carefully follow by deeper more qualitative methods.

* 5p 

  Access Paper or Ask Questions

KenMeSH: Knowledge-enhanced End-to-end Biomedical Text Labelling

Mar 14, 2022
Xindi Wang, Robert E. Mercer, Frank Rudzicz

Currently, Medical Subject Headings (MeSH) are manually assigned to every biomedical article published and subsequently recorded in the PubMed database to facilitate retrieving relevant information. With the rapid growth of the PubMed database, large-scale biomedical document indexing becomes increasingly important. MeSH indexing is a challenging task for machine learning, as it needs to assign multiple labels to each article from an extremely large hierachically organized collection. To address this challenge, we propose KenMeSH, an end-to-end model that combines new text features and a dynamic \textbf{K}nowledge-\textbf{en}hanced mask attention that integrates document features with MeSH label hierarchy and journal correlation features to index MeSH terms. Experimental results show the proposed method achieves state-of-the-art performance on a number of measures.

* main conference at ACL 2022 

  Access Paper or Ask Questions

Discrete Auto-regressive Variational Attention Models for Text Modeling

Jun 16, 2021
Xianghong Fang, Haoli Bai, Jian Li, Zenglin Xu, Michael Lyu, Irwin King

Variational autoencoders (VAEs) have been widely applied for text modeling. In practice, however, they are troubled by two challenges: information underrepresentation and posterior collapse. The former arises as only the last hidden state of LSTM encoder is transformed into the latent space, which is generally insufficient to summarize the data. The latter is a long-standing problem during the training of VAEs as the optimization is trapped to a disastrous local optimum. In this paper, we propose Discrete Auto-regressive Variational Attention Model (DAVAM) to address the challenges. Specifically, we introduce an auto-regressive variational attention approach to enrich the latent space by effectively capturing the semantic dependency from the input. We further design discrete latent space for the variational attention and mathematically show that our model is free from posterior collapse. Extensive experiments on language modeling tasks demonstrate the superiority of DAVAM against several VAE counterparts.

* IJCNN 2021 

  Access Paper or Ask Questions

Efficient, Uncertainty-based Moderation of Neural Networks Text Classifiers

Apr 04, 2022
Jakob Smedegaard Andersen, Walid Maalej

To maximize the accuracy and increase the overall acceptance of text classifiers, we propose a framework for the efficient, in-operation moderation of classifiers' output. Our framework focuses on use cases in which F1-scores of modern Neural Networks classifiers (ca.~90%) are still inapplicable in practice. We suggest a semi-automated approach that uses prediction uncertainties to pass unconfident, probably incorrect classifications to human moderators. To minimize the workload, we limit the human moderated data to the point where the accuracy gains saturate and further human effort does not lead to substantial improvements. A series of benchmarking experiments based on three different datasets and three state-of-the-art classifiers show that our framework can improve the classification F1-scores by 5.1 to 11.2% (up to approx.~98 to 99%), while reducing the moderation load up to 73.3% compared to a random moderation.

  Access Paper or Ask Questions

AraELECTRA: Pre-Training Text Discriminators for Arabic Language Understanding

Dec 31, 2020
Wissam Antoun, Fady Baly, Hazem Hajj

Advances in English language representation enabled a more sample-efficient pre-training task by Efficiently Learning an Encoder that Classifies Token Replacements Accurately (ELECTRA). Which, instead of training a model to recover masked tokens, it trains a discriminator model to distinguish true input tokens from corrupted tokens that were replaced by a generator network. On the other hand, current Arabic language representation approaches rely only on pretraining via masked language modeling. In this paper, we develop an Arabic language representation model, which we name AraELECTRA. Our model is pretrained using the replaced token detection objective on large Arabic text corpora. We evaluate our model on two Arabic reading comprehension tasks, and we show that AraELECTRA outperforms current state-of-the-art Arabic language representation models given the same pretraining data and with even a smaller model size.

  Access Paper or Ask Questions

End-to-End Learning for Answering Structured Queries Directly over Text

Nov 16, 2018
Paul Groth, Antony Scerri, Ron Daniel, Jr., Bradley P. Allen

Structured queries expressed in languages (such as SQL, SPARQL, or XQuery) offer a convenient and explicit way for users to express their information needs for a number of tasks. In this work, we present an approach to answer these directly over text data without storing results in a database. We specifically look at the case of knowledge bases where queries are over entities and the relations between them. Our approach combines distributed query answering (e.g. Triple Pattern Fragments) with models built for extractive question answering. Importantly, by applying distributed querying answering we are able to simplify the model learning problem. We train models for a large portion (572) of the relations within Wikidata and achieve an average 0.70 F1 measure across all models. We also present a systematic method to construct the necessary training data for this task from knowledge graphs and describe a prototype implementation.

* 18 pages, 6 figures 

  Access Paper or Ask Questions

CLIPDraw: Exploring Text-to-Drawing Synthesis through Language-Image Encoders

Jun 28, 2021
Kevin Frans, L. B. Soros, Olaf Witkowski

This work presents CLIPDraw, an algorithm that synthesizes novel drawings based on natural language input. CLIPDraw does not require any training; rather a pre-trained CLIP language-image encoder is used as a metric for maximizing similarity between the given description and a generated drawing. Crucially, CLIPDraw operates over vector strokes rather than pixel images, a constraint that biases drawings towards simpler human-recognizable shapes. Results compare between CLIPDraw and other synthesis-through-optimization methods, as well as highlight various interesting behaviors of CLIPDraw, such as satisfying ambiguous text in multiple ways, reliably producing drawings in diverse artistic styles, and scaling from simple to complex visual representations as stroke count is increased. Code for experimenting with the method is available at:

  Access Paper or Ask Questions

Assessing perceived organizational leadership styles through twitter text mining

May 24, 2021
A. La Bella, A. Fronzetti Colladon, E. Battistoni, S. Castellan, M. Francucci

We propose a text classification tool based on support vector machines for the assessment of organizational leadership styles, as appearing to Twitter users. We collected Twitter data over 51 days, related to the first 30 Italian organizations in the 2015 ranking of Forbes Global 2000-out of which we selected the five with the most relevant volumes of tweets. We analyzed the communication of the company leaders, together with the dialogue among the stakeholders of each company, to understand the association with perceived leadership styles and dimensions. To assess leadership profiles, we referred to the 10-factor model developed by Barchiesi and La Bella in 2007. We maintain the distinctiveness of the approach we propose, as it allows a rapid assessment of the perceived leadership capabilities of an enterprise, as they emerge from its social media interactions. It can also be used to show how companies respond and manage their communication when specific events take place, and to assess their stakeholder's reactions.

* Journal of the Association for Information Science and Technology 61(1), 21-31 (2018) 

  Access Paper or Ask Questions

Conditioned Text Generation with Transfer for Closed-Domain Dialogue Systems

Nov 03, 2020
Stéphane d'Ascoli, Alice Coucke, Francesco Caltagirone, Alexandre Caulier, Marc Lelarge

Scarcity of training data for task-oriented dialogue systems is a well known problem that is usually tackled with costly and time-consuming manual data annotation. An alternative solution is to rely on automatic text generation which, although less accurate than human supervision, has the advantage of being cheap and fast. Our contribution is twofold. First we show how to optimally train and control the generation of intent-specific sentences using a conditional variational autoencoder. Then we introduce a new protocol called query transfer that allows to leverage a large unlabelled dataset, possibly containing irrelevant queries, to extract relevant information. Comparison with two different baselines shows that this method, in the appropriate regime, consistently improves the diversity of the generated queries without compromising their quality. We also demonstrate the effectiveness of our generation method as a data augmentation technique for language modelling tasks.

* arXiv admin note: substantial text overlap with arXiv:1911.03698 

  Access Paper or Ask Questions