Get our free extension to see links to code for papers anywhere online!

Chrome logo  Add to Chrome

Firefox logo Add to Firefox

"Text Classification": models, code, and papers

Learning Robust, Transferable Sentence Representations for Text Classification

Sep 28, 2018
Wasi Uddin Ahmad, Xueying Bai, Nanyun Peng, Kai-Wei Chang

Despite deep recurrent neural networks (RNNs) demonstrate strong performance in text classification, training RNN models are often expensive and requires an extensive collection of annotated data which may not be available. To overcome the data limitation issue, existing approaches leverage either pre-trained word embedding or sentence representation to lift the burden of training RNNs from scratch. In this paper, we show that jointly learning sentence representations from multiple text classification tasks and combining them with pre-trained word-level and sentence level encoders result in robust sentence representations that are useful for transfer learning. Extensive experiments and analyses using a wide range of transfer and linguistic tasks endorse the effectiveness of our approach.

* arXiv admin note: substantial text overlap with arXiv:1804.07911 

Text Classification based on Multiple Block Convolutional Highways

Jul 23, 2018
Seyed Mahdi Rezaeinia, Ali Ghodsi, Rouhollah Rahmani

In the Text Classification areas of Sentiment Analysis, Subjectivity/Objectivity Analysis, and Opinion Polarity, Convolutional Neural Networks have gained special attention because of their performance and accuracy. In this work, we applied recent advances in CNNs and propose a novel architecture, Multiple Block Convolutional Highways (MBCH), which achieves improved accuracy on multiple popular benchmark datasets, compared to previous architectures. The MBCH is based on new techniques and architectures including highway networks, DenseNet, batch normalization and bottleneck layers. In addition, to cope with the limitations of existing pre-trained word vectors which are used as inputs for the CNN, we propose a novel method, Improved Word Vectors (IWV). The IWV improves the accuracy of CNNs which are used for text classification tasks.

* arXiv admin note: text overlap with arXiv:1711.08609 

Pre-Trained Language Transformers are Universal Image Classifiers

Jan 25, 2022
Rahul Goel, Modar Sulaiman, Kimia Noorbakhsh, Mahdi Sharifi, Rajesh Sharma, Pooyan Jamshidi, Kallol Roy

Facial images disclose many hidden personal traits such as age, gender, race, health, emotion, and psychology. Understanding these traits will help to classify the people in different attributes. In this paper, we have presented a novel method for classifying images using a pretrained transformer model. We apply the pretrained transformer for the binary classification of facial images in criminal and non-criminal classes. The pretrained transformer of GPT-2 is trained to generate text and then fine-tuned to classify facial images. During the finetuning process with images, most of the layers of GT-2 are frozen during backpropagation and the model is frozen pretrained transformer (FPT). The FPT acts as a universal image classifier, and this paper shows the application of FPT on facial images. We also use our FPT on encrypted images for classification. Our FPT shows high accuracy on both raw facial images and encrypted images. We hypothesize the meta-learning capacity FPT gained because of its large size and trained on a large size with theory and experiments. The GPT-2 trained to generate a single word token at a time, through the autoregressive process, forced to heavy-tail distribution. Then the FPT uses the heavy-tail property as its meta-learning capacity for classifying images. Our work shows one way to avoid bias during the machine classification of images.The FPT encodes worldly knowledge because of the pretraining of one text, which it uses during the classification. The statistical error of classification is reduced because of the added context gained from the text.Our paper shows the ethical dimension of using encrypted data for classification.Criminal images are sensitive to share across the boundary but encrypted largely evades ethical concern.FPT showing good classification accuracy on encrypted images shows promise for further research on privacy-preserving machine learning.


Towards Robustness to Label Noise in Text Classification via Noise Modeling

Jan 27, 2021
Siddhant Garg, Goutham Ramakrishnan, Varun Thumbe

Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over the classifier. We first assign a probability score to each training sample of having a noisy label, through a beta mixture model fitted on the losses at an early epoch of training. Then, we use this score to selectively guide the learning of the noise model and classifier. Our empirical evaluation on two text classification tasks shows that our approach can improve over the baseline accuracy, and prevent over-fitting to the noise.


Conditional Variance Penalties and Domain Shift Robustness

May 08, 2018
Christina Heinze-Deml, Nicolai Meinshausen

When training a deep network for image classification, one can broadly distinguish between two types of latent features of images that will drive the classification. Following the notation of Gong et al. (2016), we can divide latent features into (i) "core" features $X^\text{core}$ whose distribution $X^\text{core}\vert Y$ does not change substantially across domains and (ii) "style" features $X^{\text{style}}$ whose distribution $X^{\text{style}}\vert Y$ can change substantially across domains. These latter orthogonal features would generally include features such as rotation, image quality or brightness but also more complex ones like hair color or posture for images of persons. Guarding against future adversarial domain shifts implies that the influence of the second type of style features in the prediction has to be limited. We assume that the domain itself is not observed and hence a latent variable. We do assume, however, that we can sometimes observe a typically discrete identifier or $\mathrm{ID}$ variable. We know in some applications, for example, that two images show the same person, and $\mathrm{ID}$ then refers to the identity of the person. The method requires only a small fraction of images to have an $\mathrm{ID}$ variable. We group data samples if they share the same class and identifier $(Y,\mathrm{ID})=(y,\mathrm{id})$ and penalize the conditional variance of the prediction if we condition on $(Y,\mathrm{ID})$. Using this approach is shown to protect against shifts in the distribution of the style variables for both regression and classification models. Specifically, the conditional variance penalty CoRe is shown to be equivalent to minimizing the risk under noise interventions in a regression setting and is shown to lead to adversarial risk consistency in a partially linear classification setting.


No Token Left Behind: Explainability-Aided Image Classification and Generation

Apr 11, 2022
Roni Paiss, Hila Chefer, Lior Wolf

The application of zero-shot learning in computer vision has been revolutionized by the use of image-text matching models. The most notable example, CLIP, has been widely used for both zero-shot classification and guiding generative models with a text prompt. However, the zero-shot use of CLIP is unstable with respect to the phrasing of the input text, making it necessary to carefully engineer the prompts used. We find that this instability stems from a selective similarity score, which is based only on a subset of the semantically meaningful input tokens. To mitigate it, we present a novel explainability-based approach, which adds a loss term to ensure that CLIP focuses on all relevant semantic parts of the input, in addition to employing the CLIP similarity loss used in previous works. When applied to one-shot classification through prompt engineering, our method yields an improvement in the recognition rate, without additional training or fine-tuning. Additionally, we show that CLIP guidance of generative models using our method significantly improves the generated images. Finally, we demonstrate a novel use of CLIP guidance for text-based image generation with spatial conditioning on object location, by requiring the image explainability heatmap for each object to be confined to a pre-determined bounding box.


Heterogeneous Graph Neural Networks for Multi-label Text Classification

Mar 26, 2021
Irene Li, Tianxiao Li, Yixin Li, Ruihai Dong, Toyotaro Suzumura

Multi-label text classification (MLTC) is an attractive and challenging task in natural language processing (NLP). Compared with single-label text classification, MLTC has a wider range of applications in practice. In this paper, we propose a heterogeneous graph convolutional network model to solve the MLTC problem by modeling tokens and labels as nodes in a heterogeneous graph. In this way, we are able to take into account multiple relationships including token-level relationships. Besides, the model allows a good explainability as the token-label edges are exposed. We evaluate our method on three real-world datasets and the experimental results show that it achieves significant improvements and outperforms state-of-the-art comparison methods.

* 8 tables, 4 figures 

A Comparison of Neural Network Training Methods for Text Classification

Oct 28, 2019
Anderson de Andrade

We study the impact of neural networks in text classification. Our focus is on training deep neural networks with proper weight initialization and greedy layer-wise pretraining. Results are compared with 1-layer neural networks and Support Vector Machines. We work with a dataset of labeled messages from the Twitter microblogging service and aim to predict weather conditions. A feature extraction procedure specific for the task is proposed, which applies dimensionality reduction using Latent Semantic Analysis. Our results show that neural networks outperform Support Vector Machines with Gaussian kernels, noticing performance gains from introducing additional hidden layers with nonlinearities. The impact of using Nesterov's Accelerated Gradient in backpropagation is also studied. We conclude that deep neural networks are a reasonable approach for text classification and propose further ideas to improve performance.


Improving patch-based scene text script identification with ensembles of conjoined networks

Feb 01, 2017
Lluis Gomez, Anguelos Nicolaou, Dimosthenis Karatzas

This paper focuses on the problem of script identification in scene text images. Facing this problem with state of the art CNN classifiers is not straightforward, as they fail to address a key characteristic of scene text instances: their extremely variable aspect ratio. Instead of resizing input images to a fixed aspect ratio as in the typical use of holistic CNN classifiers, we propose here a patch-based classification framework in order to preserve discriminative parts of the image that are characteristic of its class. We describe a novel method based on the use of ensembles of conjoined networks to jointly learn discriminative stroke-parts representations and their relative importance in a patch-based classification scheme. Our experiments with this learning procedure demonstrate state-of-the-art results in two public script identification datasets. In addition, we propose a new public benchmark dataset for the evaluation of multi-lingual scene text end-to-end reading systems. Experiments done in this dataset demonstrate the key role of script identification in a complete end-to-end system that combines our script identification method with a previously published text detector and an off-the-shelf OCR engine.


Appraisal Theories for Emotion Classification in Text

Apr 07, 2020
Jan Hofmann, Enrica Troiano, Kai Sassenberg, Roman Klinger

Automatic emotion categorization has been predominantly formulated as text classification in which textual units are assigned to an emotion from a predefined inventory, for instance following the fundamental emotion classes proposed by Paul Ekman (fear, joy, anger, disgust, sadness, surprise) or Robert Plutchik (adding trust, anticipation). This approach ignores existing psychological theories to some degree, which provide explanations regarding the perception of events (for instance, that somebody experiences fear when they discover a snake because of the appraisal as being an unpleasant and non-controllable situation), even without having access to explicit reports what an experiencer of an emotion is feeling (for instance expressing this with the words "I am afraid."). Automatic classification approaches therefore need to learn properties of events as latent variables (for instance that the uncertainty and effort associated with discovering the snake leads to fear). With this paper, we propose to make such interpretations of events explicit, following theories of cognitive appraisal of events and show their potential for emotion classification when being encoded in classification models. Our results show that high quality appraisal dimension assignments in event descriptions lead to an improvement in the classification of discrete emotion categories.