Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Learning from missing data with the Latent Block Model

Oct 23, 2020
Gabriel Frisch, Jean-Benoist Léger, Yves Grandvalet

Missing data can be informative. Ignoring this information can lead to misleading conclusions when the data model does not allow information to be extracted from the missing data. We propose a co-clustering model, based on the Latent Block Model, that aims to take advantage of this nonignorable nonresponses, also known as Missing Not At Random data (MNAR). A variational expectation-maximization algorithm is derived to perform inference and a model selection criterion is presented. We assess the proposed approach on a simulation study, before using our model on the voting records from the lower house of the French Parliament, where our analysis brings out relevant groups of MPs and texts, together with a sensible interpretation of the behavior of non-voters.

  Access Paper or Ask Questions

Facial gesture interfaces for expression and communication

Oct 04, 2020
Michael J. Lyons

Considerable effort has been devoted to the automatic extraction of information about action of the face from image sequences. Within the context of human-computer interaction (HCI) we may distinguish systems that allow expression from those which aim at recognition. Most of the work in facial action processing has been directed at automatically recognizing affect from facial actions. By contrast, facial gesture interfaces, which respond to deliberate facial actions, have received comparatively little attention. This paper reviews several projects on vision-based interfaces that rely on facial action for intentional HCI. Applications to several domains are introduced, including text entry, artistic and musical expression and assistive technology for motor-impaired users.

* 2004 IEEE International Conference on Systems, Man and Cybernetics 
* 6 pages, 8 figures 

  Access Paper or Ask Questions

Graph-based Modeling of Online Communities for Fake News Detection

Aug 14, 2020
Shantanu Chandra, Pushkar Mishra, Helen Yannakoudakis, Ekaterina Shutova

Over the past few years, there has been substantial effort towards automated detection of fake news. Existing research has modeled the structure, style and content of news articles, as well as the demographic traits of users. However, no attention has been directed towards modeling the properties of online communities that interact with fake news. In this work, we propose a novel approach via graph-based modeling of online communities. Our method aggregates information with respect to: 1) the nature of the content disseminated, 2) content-sharing behavior of users, and 3) the social network of those users. We empirically demonstrate that this yields significant improvements over existing text and user-based techniques for fake news detection.

  Access Paper or Ask Questions

TextAttack: A Framework for Adversarial Attacks in Natural Language Processing

May 13, 2020
John X. Morris, Eli Lifland, Jin Yong Yoo, Yanjun Qi

TextAttack is a library for running adversarial attacks against natural language processing (NLP) models. TextAttack builds attacks from four components: a search method, goal function, transformation, and a set of constraints. Researchers can use these components to easily assemble new attacks. Individual components can be isolated and compared for easier ablation studies. TextAttack currently supports attacks on models trained for text classification and entailment across a variety of datasets. Additionally, TextAttack's modular design makes it easily extensible to new NLP tasks, models, and attack strategies. TextAttack code and tutorials are available at

* 6 pages. More details are shared at 

  Access Paper or Ask Questions

TCNN: Triple Convolutional Neural Network Models for Retrieval-based Question Answering System in E-commerce

Apr 23, 2020
Shuangyong Song, Chao Wang

Automatic question-answering (QA) systems have boomed during last few years, and commonly used techniques can be roughly categorized into Information Retrieval (IR)-based and generation-based. A key solution to the IR based models is to retrieve the most similar knowledge entries of a given query from a QA knowledge base, and then rerank those knowledge entries with semantic matching models. In this paper, we aim to improve an IR based e-commerce QA system-AliMe with proposed text matching models, including a basic Triple Convolutional Neural Network (TCNN) model and two Attention-based TCNN (ATCNN) models. Experimental results show their effect.

* 2 pages 

  Access Paper or Ask Questions

Discovering associations in COVID-19 related research papers

Apr 06, 2020
Iztok Fister Jr., Karin Fister, Iztok Fister

A COVID-19 pandemic has already proven itself to be a global challenge. It proves how vulnerable humanity can be. It has also mobilized researchers from different sciences and different countries in the search for a way to fight this potentially fatal disease. In line with this, our study analyses the abstracts of papers related to COVID-19 and coronavirus-related-research using association rule text mining in order to find the most interestingness words, on the one hand, and relationships between them on the other. Then, a method, called information cartography, was applied for extracting structured knowledge from a huge amount of association rules. On the basis of these methods, the purpose of our study was to show how researchers have responded in similar epidemic/pandemic situations throughout history.

* arXiv admin note: text overlap with arXiv:2003.00348 

  Access Paper or Ask Questions

A Deep Neural Framework for Contextual Affect Detection

Jan 28, 2020
Kumar Shikhar Deep, Asif Ekbal, Pushpak Bhattacharyya

A short and simple text carrying no emotion can represent some strong emotions when reading along with its context, i.e., the same sentence can express extreme anger as well as happiness depending on its context. In this paper, we propose a Contextual Affect Detection (CAD) framework which learns the inter-dependence of words in a sentence, and at the same time the inter-dependence of sentences in a dialogue. Our proposed CAD framework is based on a Gated Recurrent Unit (GRU), which is further assisted by contextual word embeddings and other diverse hand-crafted feature sets. Evaluation and analysis suggest that our model outperforms the state-of-the-art methods by 5.49% and 9.14% on Friends and EmotionPush dataset, respectively.

* LNCS, volume 11955 Year - 2019 Pg 398-409 
* 12 pages, 5 tables and 3 figures. Accepted in ICONIP 2019 (International Conference on Neural Information Processing) Published in Lecture Notes in Computer Science, vol 11955. Springer, Cham 

  Access Paper or Ask Questions

Recurrent Neural Networks (RNNs): A gentle Introduction and Overview

Nov 23, 2019
Robin M. Schmidt

State-of-the-art solutions in the areas of "Language Modelling & Generating Text", "Speech Recognition", "Generating Image Descriptions" or "Video Tagging" have been using Recurrent Neural Networks as the foundation for their approaches. Understanding the underlying concepts is therefore of tremendous importance if we want to keep up with recent or upcoming publications in those areas. In this work we give a short overview over some of the most important concepts in the realm of Recurrent Neural Networks which enables readers to easily understand the fundamentals such as but not limited to "Backpropagation through Time" or "Long Short-Term Memory Units" as well as some of the more recent advances like the "Attention Mechanism" or "Pointer Networks". We also give recommendations for further reading regarding more complex topics where it is necessary.

  Access Paper or Ask Questions

How to Evaluate Word Representations of Informal Domain?

Nov 13, 2019
Yekun Chai, Naomi Saphra, Adam Lopez

Diverse word representations have surged in most state-of-the-art natural language processing (NLP) applications. Nevertheless, how to efficiently evaluate such word embeddings in the informal domain such as Twitter or forums, remains an ongoing challenge due to the lack of sufficient evaluation dataset. We derived a large list of variant spelling pairs from UrbanDictionary with the automatic approaches of weakly-supervised pattern-based bootstrapping and self-training linear-chain conditional random field (CRF). With these extracted relation pairs we promote the odds of eliding the text normalization procedure of traditional NLP pipelines and directly adopting representations of non-standard words in the informal domain. Our code is available.

  Access Paper or Ask Questions