Alert button

"Text Classification": models, code, and papers
Alert button

Zero-Shot Text Classification via Self-Supervised Tuning

May 25, 2023
Chaoqun Liu, Wenxuan Zhang, Guizhen Chen, Xiaobao Wu, Anh Tuan Luu, Chip Hong Chang, Lidong Bing

Figure 1 for Zero-Shot Text Classification via Self-Supervised Tuning
Figure 2 for Zero-Shot Text Classification via Self-Supervised Tuning
Figure 3 for Zero-Shot Text Classification via Self-Supervised Tuning
Figure 4 for Zero-Shot Text Classification via Self-Supervised Tuning

Existing solutions to zero-shot text classification either conduct prompting with pre-trained language models, which is sensitive to the choices of templates, or rely on large-scale annotated data of relevant tasks for meta-tuning. In this work, we propose a new paradigm based on self-supervised learning to solve zero-shot text classification tasks by tuning the language models with unlabeled data, called self-supervised tuning. By exploring the inherent structure of free texts, we propose a new learning objective called first sentence prediction to bridge the gap between unlabeled data and text classification tasks. After tuning the model to learn to predict the first sentence in a paragraph based on the rest, the model is able to conduct zero-shot inference on unseen tasks such as topic classification and sentiment analysis. Experimental results show that our model outperforms the state-of-the-art baselines on 7 out of 10 tasks. Moreover, the analysis reveals that our model is less sensitive to the prompt design. Our code and pre-trained models are publicly available at https://github.com/DAMO-NLP-SG/SSTuning .

* Accepted to the Findings of ACL 2023 

Label Agnostic Pre-training for Zero-shot Text Classification

May 25, 2023
Christopher Clarke, Yuzhao Heng, Yiping Kang, Krisztian Flautner, Lingjia Tang, Jason Mars

Figure 1 for Label Agnostic Pre-training for Zero-shot Text Classification
Figure 2 for Label Agnostic Pre-training for Zero-shot Text Classification
Figure 3 for Label Agnostic Pre-training for Zero-shot Text Classification
Figure 4 for Label Agnostic Pre-training for Zero-shot Text Classification

Conventional approaches to text classification typically assume the existence of a fixed set of predefined labels to which a given text can be classified. However, in real-world applications, there exists an infinite label space for describing a given text. In addition, depending on the aspect (sentiment, topic, etc.) and domain of the text (finance, legal, etc.), the interpretation of the label can vary greatly. This makes the task of text classification, particularly in the zero-shot scenario, extremely challenging. In this paper, we investigate the task of zero-shot text classification with the aim of improving the ability of pre-trained language models (PLMs) to generalize to both seen and unseen data across varying aspects and domains. To solve this we introduce two new simple yet effective pre-training strategies, Implicit and Explicit pre-training. These methods inject aspect-level understanding into the model at train time with the goal of conditioning the model to build task-level understanding. To evaluate this, we construct and release UTCD, a new benchmark dataset for evaluating text classification in zero-shot settings. Experimental results on UTCD show that our approach achieves improved zero-shot generalization on a suite of challenging datasets across an array of zero-shot formalizations.

* Findings of ACL 2023 

PESCO: Prompt-enhanced Self Contrastive Learning for Zero-shot Text Classification

May 24, 2023
Yau-Shian Wang, Ta-Chung Chi, Ruohong Zhang, Yiming Yang

Figure 1 for PESCO: Prompt-enhanced Self Contrastive Learning for Zero-shot Text Classification
Figure 2 for PESCO: Prompt-enhanced Self Contrastive Learning for Zero-shot Text Classification
Figure 3 for PESCO: Prompt-enhanced Self Contrastive Learning for Zero-shot Text Classification
Figure 4 for PESCO: Prompt-enhanced Self Contrastive Learning for Zero-shot Text Classification

We present PESCO, a novel contrastive learning framework that substantially improves the performance of zero-shot text classification. We formulate text classification as a neural text matching problem where each document is treated as a query, and the system learns the mapping from each query to the relevant class labels by (1) adding prompts to enhance label matching, and (2) using retrieved labels to enrich the training set in a self-training loop of contrastive learning. PESCO achieves state-of-the-art performance on four benchmark text classification datasets. On DBpedia, we achieve 98.5\% accuracy without any labeled data, which is close to the fully-supervised result. Extensive experiments and analyses show all the components of PESCO are necessary for improving the performance of zero-shot text classification.

* ACL 2023  
* accepted by ACL 2023 

Text Classification via Large Language Models

May 22, 2023
Xiaofei Sun, Xiaoya Li, Jiwei Li, Fei Wu, Shangwei Guo, Tianwei Zhang, Guoyin Wang

Figure 1 for Text Classification via Large Language Models
Figure 2 for Text Classification via Large Language Models
Figure 3 for Text Classification via Large Language Models
Figure 4 for Text Classification via Large Language Models

Despite the remarkable success of large-scale Language Models (LLMs) such as GPT-3, their performances still significantly underperform fine-tuned models in the task of text classification. This is due to (1) the lack of reasoning ability in addressing complex linguistic phenomena (e.g., intensification, contrast, irony etc); (2) limited number of tokens allowed in in-context learning. In this paper, we introduce Clue And Reasoning Prompting (CARP). CARP adopts a progressive reasoning strategy tailored to addressing the complex linguistic phenomena involved in text classification: CARP first prompts LLMs to find superficial clues (e.g., keywords, tones, semantic relations, references, etc), based on which a diagnostic reasoning process is induced for final decisions. To further address the limited-token issue, CARP uses a fine-tuned model on the supervised dataset for $k$NN demonstration search in the in-context learning, allowing the model to take the advantage of both LLM's generalization ability and the task-specific evidence provided by the full labeled dataset. Remarkably, CARP yields new SOTA performances on 4 out of 5 widely-used text-classification benchmarks, 97.39 (+1.24) on SST-2, 96.40 (+0.72) on AGNews, 98.78 (+0.25) on R8 and 96.95 (+0.6) on R52, and a performance comparable to SOTA on MR (92.39 v.s. 93.3). More importantly, we find that CARP delivers impressive abilities on low-resource and domain-adaptation setups. Specifically, using 16 examples per class, CARP achieves comparable performances to supervised models with 1,024 examples per class.

* Pre-print Version 

On Bias and Fairness in NLP: How to have a fairer text classification?

May 22, 2023
Fatma Elsafoury, Stamos Katsigiannis, Naeem Ramzan

Figure 1 for On Bias and Fairness in NLP: How to have a fairer text classification?
Figure 2 for On Bias and Fairness in NLP: How to have a fairer text classification?
Figure 3 for On Bias and Fairness in NLP: How to have a fairer text classification?
Figure 4 for On Bias and Fairness in NLP: How to have a fairer text classification?

In this paper, we provide a holistic analysis of the different sources of bias, Upstream, Sample and Overampflication biases, in NLP models. We investigate how they impact the fairness of the task of text classification. We also investigate the impact of removing these biases using different debiasing techniques on the fairness of text classification. We found that overamplification bias is the most impactful bias on the fairness of text classification. And that removing overamplification bias by fine-tuning the LM models on a dataset with balanced representations of the different identity groups leads to fairer text classification models. Finally, we build on our findings and introduce practical guidelines on how to have a fairer text classification model.

* 10 pages 

WOT-Class: Weakly Supervised Open-world Text Classification

May 21, 2023
Tianle Wang, Zihan Wang, Weitang Liu, Jingbo Shang

Figure 1 for WOT-Class: Weakly Supervised Open-world Text Classification
Figure 2 for WOT-Class: Weakly Supervised Open-world Text Classification
Figure 3 for WOT-Class: Weakly Supervised Open-world Text Classification
Figure 4 for WOT-Class: Weakly Supervised Open-world Text Classification

State-of-the-art weakly supervised text classification methods, while significantly reduced the required human supervision, still requires the supervision to cover all the classes of interest. This is never easy to meet in practice when human explore new, large corpora without complete pictures. In this paper, we work on a novel yet important problem of weakly supervised open-world text classification, where supervision is only needed for a few examples from a few known classes and the machine should handle both known and unknown classes in test time. General open-world classification has been studied mostly using image classification; however, existing methods typically assume the availability of sufficient known-class supervision and strong unknown-class prior knowledge (e.g., the number and/or data distribution). We propose a novel framework WOT-Class that lifts those strong assumptions. Specifically, it follows an iterative process of (a) clustering text to new classes, (b) mining and ranking indicative words for each class, and (c) merging redundant classes by using the overlapped indicative words as a bridge. Extensive experiments on 7 popular text classification datasets demonstrate that WOT-Class outperforms strong baselines consistently with a large margin, attaining 23.33% greater average absolute macro-F1 over existing approaches across all datasets. Such competent accuracy illuminates the practical potential of further reducing human effort for text classification.

Hierarchical Verbalizer for Few-Shot Hierarchical Text Classification

May 26, 2023
Ke Ji, Yixin Lian, Jingsheng Gao, Baoyuan Wang

Figure 1 for Hierarchical Verbalizer for Few-Shot Hierarchical Text Classification
Figure 2 for Hierarchical Verbalizer for Few-Shot Hierarchical Text Classification
Figure 3 for Hierarchical Verbalizer for Few-Shot Hierarchical Text Classification
Figure 4 for Hierarchical Verbalizer for Few-Shot Hierarchical Text Classification

Due to the complex label hierarchy and intensive labeling cost in practice, the hierarchical text classification (HTC) suffers a poor performance especially when low-resource or few-shot settings are considered. Recently, there is a growing trend of applying prompts on pre-trained language models (PLMs), which has exhibited effectiveness in the few-shot flat text classification tasks. However, limited work has studied the paradigm of prompt-based learning in the HTC problem when the training data is extremely scarce. In this work, we define a path-based few-shot setting and establish a strict path-based evaluation metric to further explore few-shot HTC tasks. To address the issue, we propose the hierarchical verbalizer ("HierVerb"), a multi-verbalizer framework treating HTC as a single- or multi-label classification problem at multiple layers and learning vectors as verbalizers constrained by hierarchical structure and hierarchical contrastive learning. In this manner, HierVerb fuses label hierarchy knowledge into verbalizers and remarkably outperforms those who inject hierarchy through graph encoders, maximizing the benefits of PLMs. Extensive experiments on three popular HTC datasets under the few-shot settings demonstrate that prompt with HierVerb significantly boosts the HTC performance, meanwhile indicating an elegant way to bridge the gap between the large pre-trained model and downstream hierarchical classification tasks. Our code and few-shot dataset are publicly available at https://github.com/1KE-JI/HierVerb.

* 14 pages, 8 figures, Accepted by ACL 2023 

Perturbation-based Self-supervised Attention for Attention Bias in Text Classification

May 25, 2023
Huawen Feng, Zhenxi Lin, Qianli Ma

Figure 1 for Perturbation-based Self-supervised Attention for Attention Bias in Text Classification
Figure 2 for Perturbation-based Self-supervised Attention for Attention Bias in Text Classification
Figure 3 for Perturbation-based Self-supervised Attention for Attention Bias in Text Classification
Figure 4 for Perturbation-based Self-supervised Attention for Attention Bias in Text Classification

In text classification, the traditional attention mechanisms usually focus too much on frequent words, and need extensive labeled data in order to learn. This paper proposes a perturbation-based self-supervised attention approach to guide attention learning without any annotation overhead. Specifically, we add as much noise as possible to all the words in the sentence without changing their semantics and predictions. We hypothesize that words that tolerate more noise are less significant, and we can use this information to refine the attention distribution. Experimental results on three text classification tasks show that our approach can significantly improve the performance of current attention-based models, and is more effective than existing self-supervised methods. We also provide a visualization analysis to verify the effectiveness of our approach.

Retrieval-augmented Multi-label Text Classification

May 22, 2023
Ilias Chalkidis, Yova Kementchedjhieva

Figure 1 for Retrieval-augmented Multi-label Text Classification
Figure 2 for Retrieval-augmented Multi-label Text Classification
Figure 3 for Retrieval-augmented Multi-label Text Classification
Figure 4 for Retrieval-augmented Multi-label Text Classification

Multi-label text classification (MLC) is a challenging task in settings of large label sets, where label support follows a Zipfian distribution. In this paper, we address this problem through retrieval augmentation, aiming to improve the sample efficiency of classification models. Our approach closely follows the standard MLC architecture of a Transformer-based encoder paired with a set of classification heads. In our case, however, the input document representation is augmented through cross-attention to similar documents retrieved from the training set and represented in a task-specific manner. We evaluate this approach on four datasets from the legal and biomedical domains, all of which feature highly skewed label distributions. Our experiments show that retrieval augmentation substantially improves model performance on the long tail of infrequent labels especially so for lower-resource training scenarios and more challenging long-document data scenarios.