Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

"Text": models, code, and papers

Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder

Oct 06, 2020
Alvin Chan, Yi Tay, Yew-Soon Ong, Aston Zhang

This paper demonstrates a fatal vulnerability in natural language inference (NLI) and text classification systems. More concretely, we present a 'backdoor poisoning' attack on NLP models. Our poisoning attack utilizes conditional adversarially regularized autoencoder (CARA) to generate poisoned training samples by poison injection in latent space. Just by adding 1% poisoned data, our experiments show that a victim BERT finetuned classifier's predictions can be steered to the poison target class with success rates of >80% when the input hypothesis is injected with the poison signature, demonstrating that NLI and text classification systems face a huge security risk.

* Accepted in EMNLP-Findings 2020, Camera Ready Version 

  Access Paper or Ask Questions

The Convex Information Bottleneck Lagrangian

Jan 10, 2020
Borja Rodríguez-Gálvez, Ragnar Thobaben, Mikael Skoglund

The information bottleneck (IB) problem tackles the issue of obtaining relevant compressed representations $T$ of some random variable $X$ for the task of predicting $Y$. It is defined as a constrained optimization problem which maximizes the information the representation has about the task, $I(T;Y)$, while ensuring that a certain level of compression $r$ is achieved (i.e., $ I(X;T) \leq r$). For practical reasons, the problem is usually solved by maximizing the IB Lagrangian (i.e., $\mathcal{L}_{\text{IB}}(T;\beta) = I(T;Y) - \beta I(X;T)$) for many values of $\beta \in [0,1]$. Then, the curve of maximal $I(T;Y)$ for a given $I(X;T)$ is drawn and a representation with the desired predictability and compression is selected. It is known when $Y$ is a deterministic function of $X$, the IB curve cannot be explored and another Lagrangian has been proposed to tackle this problem: the squared IB Lagrangian: $\mathcal{L}_{\text{sq-IB}}(T;\beta_{\text{sq}})=I(T;Y)-\beta_{\text{sq}}I(X;T)^2$. In this paper, we (i) present a general family of Lagrangians which allow for the exploration of the IB curve in all scenarios; (ii) provide the exact one-to-one mapping between the Lagrange multiplier and the desired compression rate $r$ for known IB curve shapes; and (iii) show we can approximately obtain a specific compression level with the convex IB Lagrangian for both known and unknown IB curve shapes. This eliminates the burden of solving the optimization problem for many values of the Lagrange multiplier. That is, we prove that we can solve the original constrained problem with a single optimization.

* 10 pages of main text, 2 page of references and 14 pages of appendices with the proofs, experimental details and caveats 

  Access Paper or Ask Questions

Deep Laplacian Pyramid Network for Text Images Super-Resolution

Nov 26, 2018
Hanh T. M. Tran, Tien Ho-Phuoc

Convolutional neural networks have recently demonstrated interesting results for single image super-resolution. However, these networks were trained to deal with super-resolution problem on natural images. In this paper, we adapt a deep network, which was proposed for natural images superresolution, to single text image super-resolution. To evaluate the network, we present our database for single text image super-resolution. Moreover, we propose to combine Gradient Difference Loss (GDL) with L1/L2 loss to enhance edges in super-resolution image. Quantitative and qualitative evaluations on our dataset show that adding the GDL improves the super-resolution results.

* paper, 6 pages 

  Access Paper or Ask Questions

DualTKB: A Dual Learning Bridge between Text and Knowledge Base

Oct 27, 2020
Pierre L. Dognin, Igor Melnyk, Inkit Padhi, Cicero Nogueira dos Santos, Payel Das

In this work, we present a dual learning approach for unsupervised text to path and path to text transfers in Commonsense Knowledge Bases (KBs). We investigate the impact of weak supervision by creating a weakly supervised dataset and show that even a slight amount of supervision can significantly improve the model performance and enable better-quality transfers. We examine different model architectures, and evaluation metrics, proposing a novel Commonsense KB completion metric tailored for generative models. Extensive experimental results show that the proposed method compares very favorably to the existing baselines. This approach is a viable step towards a more advanced system for automatic KB construction/expansion and the reverse operation of KB conversion to coherent textual descriptions.

* Equal Contributions of Authors Pierre L. Dognin, Igor Melnyk, and Inkit Padhi. Accepted at EMNLP'20 

  Access Paper or Ask Questions

fairseq S2T: Fast Speech-to-Text Modeling with fairseq

Oct 11, 2020
Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino

We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation. It follows fairseq's careful design for scalability and extensibility. We provide end-to-end workflows from data pre-processing, model training to offline (online) inference. We implement state-of-the-art RNN-based as well as Transformer-based models and open-source detailed training recipes. Fairseq's machine translation models and language models can be seamlessly integrated into S2T workflows for multi-task learning or transfer learning. Fairseq S2T documentation and examples are available at

* Accepted to AACL 2020 Demo 

  Access Paper or Ask Questions

Interpretable Text Classification Using CNN and Max-pooling

Oct 14, 2019
Hao Cheng, Xiaoqing Yang, Zang Li, Yanghua Xiao, Yucheng Lin

Deep neural networks have been widely used in text classification. However, it is hard to interpret the neural models due to the complicate mechanisms. In this work, we study the interpretability of a variant of the typical text classification model which is based on convolutional operation and max-pooling layer. Two mechanisms: convolution attribution and n-gram feature analysis are proposed to analyse the process procedure for the CNN model. The interpretability of the model is reflected by providing posterior interpretation for neural network predictions. Besides, a multi-sentence strategy is proposed to enable the model to beused in multi-sentence situation without loss of performance and interpret ability. We evaluate the performance of the model on several classification tasks and justify the interpretable performance with some case studies.

  Access Paper or Ask Questions

Learning Robust, Transferable Sentence Representations for Text Classification

Sep 28, 2018
Wasi Uddin Ahmad, Xueying Bai, Nanyun Peng, Kai-Wei Chang

Despite deep recurrent neural networks (RNNs) demonstrate strong performance in text classification, training RNN models are often expensive and requires an extensive collection of annotated data which may not be available. To overcome the data limitation issue, existing approaches leverage either pre-trained word embedding or sentence representation to lift the burden of training RNNs from scratch. In this paper, we show that jointly learning sentence representations from multiple text classification tasks and combining them with pre-trained word-level and sentence level encoders result in robust sentence representations that are useful for transfer learning. Extensive experiments and analyses using a wide range of transfer and linguistic tasks endorse the effectiveness of our approach.

* arXiv admin note: substantial text overlap with arXiv:1804.07911 

  Access Paper or Ask Questions

Transformer-based Approaches for Legal Text Processing

Feb 13, 2022
Ha-Thanh Nguyen, Minh-Phuong Nguyen, Thi-Hai-Yen Vuong, Minh-Quan Bui, Minh-Chau Nguyen, Tran-Binh Dang, Vu Tran, Le-Minh Nguyen, Ken Satoh

In this paper, we introduce our approaches using Transformer-based models for different problems of the COLIEE 2021 automatic legal text processing competition. Automated processing of legal documents is a challenging task because of the characteristics of legal documents as well as the limitation of the amount of data. With our detailed experiments, we found that Transformer-based pretrained language models can perform well with automated legal text processing problems with appropriate approaches. We describe in detail the processing steps for each task such as problem formulation, data processing and augmentation, pretraining, finetuning. In addition, we introduce to the community two pretrained models that take advantage of parallel translations in legal domain, NFSP and NMSP. In which, NFSP achieves the state-of-the-art result in Task 5 of the competition. Although the paper focuses on technical reporting, the novelty of its approaches can also be an useful reference in automated legal document processing using Transformer-based models.

* arXiv admin note: substantial text overlap with arXiv:2106.13405 

  Access Paper or Ask Questions

Supervised Contrastive Learning for Interpretable Long Document Comparison

Aug 20, 2021
Akshita Jha, Vineeth Rakesh, Jaideep Chandrashekhar, Adithya Samavedhi, Chandan K. Reddy

Recent advancements in deep learning techniques have transformed the area of semantic text matching. However, most of the state-of-the-art models are designed to operate with short documents such as tweets, user reviews, comments, etc., and have fundamental limitations when applied to long-form documents such as scientific papers, legal documents, and patents. When handling such long documents, there are three primary challenges: (i) The presence of different contexts for the same word throughout the document, (ii) Small sections of contextually similar text between two documents, but dissimilar text in the remaining parts -- this defies the basic understanding of "similarity", and (iii) The coarse nature of a single global similarity measure which fails to capture the heterogeneity of the document content. In this paper, we describe CoLDE: Contrastive Long Document Encoder -- a transformer-based framework that addresses these challenges and allows for interpretable comparisons of long documents. CoLDE uses unique positional embeddings and a multi-headed chunkwise attention layer in conjunction with a contrastive learning framework to capture similarity at three different levels: (i) high-level similarity scores between a pair of documents, (ii) similarity scores between different sections within and across documents, and (iii) similarity scores between different chunks in the same document and also other documents. These fine-grained similarity scores aid in better interpretability. We evaluate CoLDE on three long document datasets namely, ACL Anthology publications, Wikipedia articles, and USPTO patents. Besides outperforming the state-of-the-art methods on the document comparison task, CoLDE also proves interpretable and robust to changes in document length and text perturbations.

  Access Paper or Ask Questions

sarcasm detection and quantification in arabic tweets

Aug 03, 2021
Bashar Talafha, Muhy Eddin Za'ter, Samer Suleiman, Mahmoud Al-Ayyoub, Mohammed N. Al-Kabi

The role of predicting sarcasm in the text is known as automatic sarcasm detection. Given the prevalence and challenges of sarcasm in sentiment-bearing text, this is a critical phase in most sentiment analysis tasks. With the increasing popularity and usage of different social media platforms among users around the world, people are using sarcasm more and more in their day-to-day conversations, social media posts and tweets, and it is considered as a way for people to express their sentiment about some certain topics or issues. As a result of the increasing popularity, researchers started to focus their research endeavors on detecting sarcasm from a text in different languages especially the English language. However, the task of sarcasm detection is a challenging task due to the nature of sarcastic texts; which can be relative and significantly differs from one person to another depending on the topic, region, the user's mentality and other factors. In addition to these challenges, sarcasm detection in the Arabic language has its own challenges due to the complexity of the Arabic language, such as being morphologically rich, with many dialects that significantly vary between each other, while also being lowly resourced. In recent years, only few research attempts started tackling the task of sarcasm detection in Arabic, including creating and collecting corpora, organizing workshops and establishing baseline models. This paper intends to create a new humanly annotated Arabic corpus for sarcasm detection collected from tweets, and implementing a new approach for sarcasm detection and quantification in Arabic tweets. The annotation technique followed in this paper is unique in sarcasm detection and the proposed approach tackles the problem as a regression problem instead of classification; i.e., the model attempts to predict the level of sarcasm instead of binary classification.

  Access Paper or Ask Questions