Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kalina Bontcheva

University of Sheffield

Don't Waste a Single Annotation: Improving Single-Label Classifiers Through Soft Labels

Nov 09, 2023

Ben Wu, Yue Li, Yida Mu, Carolina Scarton, Kalina Bontcheva, Xingyi Song

Figure 1 for Don't Waste a Single Annotation: Improving Single-Label Classifiers Through Soft Labels

Figure 2 for Don't Waste a Single Annotation: Improving Single-Label Classifiers Through Soft Labels

Figure 3 for Don't Waste a Single Annotation: Improving Single-Label Classifiers Through Soft Labels

Figure 4 for Don't Waste a Single Annotation: Improving Single-Label Classifiers Through Soft Labels

Abstract:In this paper, we address the limitations of the common data annotation and training methods for objective single-label classification tasks. Typically, when annotating such tasks annotators are only asked to provide a single label for each sample and annotator disagreement is discarded when a final hard label is decided through majority voting. We challenge this traditional approach, acknowledging that determining the appropriate label can be difficult due to the ambiguity and lack of context in the data samples. Rather than discarding the information from such ambiguous annotations, our soft label method makes use of them for training. Our findings indicate that additional annotator information, such as confidence, secondary label and disagreement, can be used to effectively generate soft labels. Training classifiers with these soft labels then leads to improved performance and calibration on the hard label test set.

* Accepted to EMNLP 2023 (Findings)

Via

Access Paper or Ask Questions

Analysing State-Backed Propaganda Websites: a New Dataset and Linguistic Study

Oct 21, 2023

Freddy Heppell, Kalina Bontcheva, Carolina Scarton

Figure 1 for Analysing State-Backed Propaganda Websites: a New Dataset and Linguistic Study

Figure 2 for Analysing State-Backed Propaganda Websites: a New Dataset and Linguistic Study

Figure 3 for Analysing State-Backed Propaganda Websites: a New Dataset and Linguistic Study

Figure 4 for Analysing State-Backed Propaganda Websites: a New Dataset and Linguistic Study

Abstract:This paper analyses two hitherto unstudied sites sharing state-backed disinformation, Reliable Recent News (rrn.world) and WarOnFakes (waronfakes.com), which publish content in Arabic, Chinese, English, French, German, and Spanish. We describe our content acquisition methodology and perform cross-site unsupervised topic clustering on the resulting multilingual dataset. We also perform linguistic and temporal analysis of the web page translations and topics over time, and investigate articles with false publication dates. We make publicly available this new dataset of 14,053 articles, annotated with each language version, and additional metadata such as links and images. The main contribution of this paper for the NLP community is in the novel dataset which enables studies of disinformation networks, and the training of NLP tools for disinformation detection.

* Accepted to EMNLP 2023 main conference

Via

Access Paper or Ask Questions

Examining Temporal Bias in Abusive Language Detection

Sep 25, 2023

Mali Jin, Yida Mu, Diana Maynard, Kalina Bontcheva

Abstract:The use of abusive language online has become an increasingly pervasive problem that damages both individuals and society, with effects ranging from psychological harm right through to escalation to real-life violence and even death. Machine learning models have been developed to automatically detect abusive language, but these models can suffer from temporal bias, the phenomenon in which topics, language use or social norms change over time. This study aims to investigate the nature and impact of temporal bias in abusive language detection across various languages and explore mitigation methods. We evaluate the performance of models on abusive data sets from different time periods. Our results demonstrate that temporal bias is a significant challenge for abusive language detection, with models trained on historical data showing a significant drop in performance over time. We also present an extensive linguistic analysis of these abusive data sets from a diachronic perspective, aiming to explore the reasons for language evolution and performance decline. This study sheds light on the pervasive issue of temporal bias in abusive language detection across languages, offering crucial insights into language evolution and temporal bias mitigation.

Via

Access Paper or Ask Questions

Examining the Limitations of Computational Rumor Detection Models Trained on Static Datasets

Sep 20, 2023

Yida Mu, Xingyi Song, Kalina Bontcheva, Nikolaos Aletras

Abstract:A crucial aspect of a rumor detection model is its ability to generalize, particularly its ability to detect emerging, previously unknown rumors. Past research has indicated that content-based (i.e., using solely source posts as input) rumor detection models tend to perform less effectively on unseen rumors. At the same time, the potential of context-based models remains largely untapped. The main contribution of this paper is in the in-depth evaluation of the performance gap between content and context-based models specifically on detecting new, unseen rumors. Our empirical findings demonstrate that context-based models are still overly dependent on the information derived from the rumors' source post and tend to overlook the significant role that contextual information can play. We also study the effect of data split strategies on classifier performance. Based on our experimental results, the paper also offers practical suggestions on how to minimize the effects of temporal concept drift in static datasets during the training of rumor detection methods.

Via

Access Paper or Ask Questions

Detecting Misinformation with LLM-Predicted Credibility Signals and Weak Supervision

Sep 14, 2023

João A. Leite, Olesya Razuvayevskaya, Kalina Bontcheva, Carolina Scarton

Abstract:Credibility signals represent a wide range of heuristics that are typically used by journalists and fact-checkers to assess the veracity of online content. Automating the task of credibility signal extraction, however, is very challenging as it requires high-accuracy signal-specific extractors to be trained, while there are currently no sufficiently large datasets annotated with all credibility signals. This paper investigates whether large language models (LLMs) can be prompted effectively with a set of 18 credibility signals to produce weak labels for each signal. We then aggregate these potentially noisy labels using weak supervision in order to predict content veracity. We demonstrate that our approach, which combines zero-shot LLM credibility signal labeling and weak supervision, outperforms state-of-the-art classifiers on two misinformation datasets without using any ground-truth labels for training. We also analyse the contribution of the individual credibility signals towards predicting content veracity, which provides new valuable insights into their role in misinformation detection.

Via

Access Paper or Ask Questions

Comparison between parameter-efficient techniques and full fine-tuning: A case study on multilingual news article classification

Aug 14, 2023

Olesya Razuvayevskaya, Ben Wu, Joao A. Leite, Freddy Heppell, Ivan Srba, Carolina Scarton, Kalina Bontcheva, Xingyi Song

Abstract:Adapters and Low-Rank Adaptation (LoRA) are parameter-efficient fine-tuning techniques designed to make the training of language models more efficient. Previous results demonstrated that these methods can even improve performance on some classification tasks. This paper complements the existing research by investigating how these techniques influence the classification performance and computation costs compared to full fine-tuning when applied to multilingual text classification tasks (genre, framing, and persuasion techniques detection; with different input lengths, number of predicted classes and classification difficulty), some of which have limited training data. In addition, we conduct in-depth analyses of their efficacy across different training scenarios (training on the original multilingual data; on the translations into English; and on a subset of English-only data) and different languages. Our findings provide valuable insights into the applicability of the parameter-efficient fine-tuning techniques, particularly to complex multilingual and multilabel classification tasks.

Via

Access Paper or Ask Questions

Finding Already Debunked Narratives via Multistage Retrieval: Enabling Cross-Lingual, Cross-Dataset and Zero-Shot Learning

Aug 10, 2023

Iknoor Singh, Carolina Scarton, Xingyi Song, Kalina Bontcheva

Abstract:The task of retrieving already debunked narratives aims to detect stories that have already been fact-checked. The successful detection of claims that have already been debunked not only reduces the manual efforts of professional fact-checkers but can also contribute to slowing the spread of misinformation. Mainly due to the lack of readily available data, this is an understudied problem, particularly when considering the cross-lingual task, i.e. the retrieval of fact-checking articles in a language different from the language of the online post being checked. This paper fills this gap by (i) creating a novel dataset to enable research on cross-lingual retrieval of already debunked narratives, using tweets as queries to a database of fact-checking articles; (ii) presenting an extensive experiment to benchmark fine-tuned and off-the-shelf multilingual pre-trained Transformer models for this task; and (iii) proposing a novel multistage framework that divides this cross-lingual debunk retrieval task into refinement and re-ranking stages. Results show that the task of cross-lingual retrieval of already debunked narratives is challenging and off-the-shelf Transformer models fail to outperform a strong lexical-based baseline (BM25). Nevertheless, our multistage retrieval framework is robust, outperforming BM25 in most scenarios and enabling cross-domain and zero-shot learning, without significantly harming the model's performance.

Via

Access Paper or Ask Questions

Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science

May 23, 2023

Yida Mu, Ben P. Wu, William Thorne, Ambrose Robinson, Nikolaos Aletras, Carolina Scarton, Kalina Bontcheva, Xingyi Song

Figure 1 for Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science

Figure 2 for Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science

Figure 3 for Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science

Figure 4 for Navigating Prompt Complexity for Zero-Shot Classification: A Study of Large Language Models in Computational Social Science

Abstract:Instruction-tuned Large Language Models (LLMs) have exhibited impressive language understanding and the capacity to generate responses that follow specific instructions. However, due to the computational demands associated with training these models, their applications often rely on zero-shot settings. In this paper, we evaluate the zero-shot performance of two publicly accessible LLMs, ChatGPT and OpenAssistant, in the context of Computational Social Science classification tasks, while also investigating the effects of various prompting strategies. Our experiment considers the impact of prompt complexity, including the effect of incorporating label definitions into the prompt, using synonyms for label names, and the influence of integrating past memories during the foundation model training. The findings indicate that in a zero-shot setting, the current LLMs are unable to match the performance of smaller, fine-tuned baseline transformer models (such as BERT). Additionally, we find that different prompting strategies can significantly affect classification accuracy, with variations in accuracy and F1 scores exceeding 10%.

Via

Access Paper or Ask Questions

A Large-Scale Comparative Study of Accurate COVID-19 Information versus Misinformation

Apr 10, 2023

Yida Mu, Ye Jiang, Freddy Heppell, Iknoor Singh, Carolina Scarton, Kalina Bontcheva, Xingyi Song

Figure 1 for A Large-Scale Comparative Study of Accurate COVID-19 Information versus Misinformation

Figure 2 for A Large-Scale Comparative Study of Accurate COVID-19 Information versus Misinformation

Figure 3 for A Large-Scale Comparative Study of Accurate COVID-19 Information versus Misinformation

Figure 4 for A Large-Scale Comparative Study of Accurate COVID-19 Information versus Misinformation

Abstract:The COVID-19 pandemic led to an infodemic where an overwhelming amount of COVID-19 related content was being disseminated at high velocity through social media. This made it challenging for citizens to differentiate between accurate and inaccurate information about COVID-19. This motivated us to carry out a comparative study of the characteristics of COVID-19 misinformation versus those of accurate COVID-19 information through a large-scale computational analysis of over 242 million tweets. The study makes comparisons alongside four key aspects: 1) the distribution of topics, 2) the live status of tweets, 3) language analysis and 4) the spreading power over time. An added contribution of this study is the creation of a COVID-19 misinformation classification dataset. Finally, we demonstrate that this new dataset helps improve misinformation classification by more than 9% based on average F1 measure.

Via

Access Paper or Ask Questions

Examining Temporalities on Stance Detection Towards COVID-19 Vaccination

Apr 10, 2023

Yida Mu, Mali Jin, Kalina Bontcheva, Xingyi Song

Abstract:Previous studies have highlighted the importance of vaccination as an effective strategy to control the transmission of the COVID-19 virus. It is crucial for policymakers to have a comprehensive understanding of the public's stance towards vaccination on a large scale. However, attitudes towards COVID-19 vaccination, such as pro-vaccine or vaccine hesitancy, have evolved over time on social media. Thus, it is necessary to account for possible temporal shifts when analysing these stances. This study aims to examine the impact of temporal concept drift on stance detection towards COVID-19 vaccination on Twitter. To this end, we evaluate a range of transformer-based models using chronological and random splits of social media data. Our findings demonstrate significant discrepancies in model performance when comparing random and chronological splits across all monolingual and multilingual datasets. Chronological splits significantly reduce the accuracy of stance classification. Therefore, real-world stance detection approaches need to be further refined to incorporate temporal factors as a key consideration.

Via

Access Paper or Ask Questions