Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xingyi Song

University of Sheffield

Efficient Annotator Reliablity Assessment with EffiARA

Apr 01, 2025

Owen Cook, Jake Vasilakes, Ian Roberts, Xingyi Song

Figure 1 for Efficient Annotator Reliablity Assessment with EffiARA

Figure 2 for Efficient Annotator Reliablity Assessment with EffiARA

Figure 3 for Efficient Annotator Reliablity Assessment with EffiARA

Abstract:Data annotation is an essential component of the machine learning pipeline; it is also a costly and time-consuming process. With the introduction of transformer-based models, annotation at the document level is increasingly popular; however, there is no standard framework for structuring such tasks. The EffiARA annotation framework is, to our knowledge, the first project to support the whole annotation pipeline, from understanding the resources required for an annotation task to compiling the annotated dataset and gaining insights into the reliability of individual annotators as well as the dataset as a whole. The framework's efficacy is supported by two previous studies: one improving classification performance through annotator-reliability-based soft label aggregation and sample weighting, and the other increasing the overall agreement among annotators through removing identifying and replacing an unreliable annotator. This work introduces the EffiARA Python package and its accompanying webtool, which provides an accessible graphical user interface for the system. We open-source the EffiARA Python package at https://github.com/MiniEggz/EffiARA and the webtool is publicly accessible at https://effiara.gate.ac.uk.

Via

Access Paper or Ask Questions

A Dataset for Analysing News Framing in Chinese Media

Mar 06, 2025

Owen Cook, Yida Mu, Xinye Yang, Xingyi Song, Kalina Bontcheva

Abstract:Framing is an essential device in news reporting, allowing the writer to influence public perceptions of current affairs. While there are existing automatic news framing detection datasets in various languages, none of them focus on news framing in the Chinese language which has complex character meanings and unique linguistic features. This study introduces the first Chinese News Framing dataset, to be used as either a stand-alone dataset or a supplementary resource to the SemEval-2023 task 3 dataset. We detail its creation and we run baseline experiments to highlight the need for such a dataset and create benchmarks for future research, providing results obtained through fine-tuning XLM-RoBERTa-Base and using GPT-4o in the zero-shot setting. We find that GPT-4o performs significantly worse than fine-tuned XLM-RoBERTa across all languages. For the Chinese language, we obtain an F1-micro (the performance metric for SemEval task 3, subtask 2) score of 0.719 using only samples from our Chinese News Framing dataset and a score of 0.753 when we augment the SemEval dataset with Chinese news framing samples. With positive news frame detection results, this dataset is a valuable resource for detecting news frames in the Chinese language and is a valuable supplement to the SemEval-2023 task 3 dataset.

Via

Access Paper or Ask Questions

Efficient Annotator Reliability Assessment and Sample Weighting for Knowledge-Based Misinformation Detection on Social Media

Oct 18, 2024

Owen Cook, Charlie Grimshaw, Ben Wu, Sophie Dillon, Jack Hicks, Luke Jones, Thomas Smith, Matyas Szert, Xingyi Song

Figure 1 for Efficient Annotator Reliability Assessment and Sample Weighting for Knowledge-Based Misinformation Detection on Social Media

Figure 2 for Efficient Annotator Reliability Assessment and Sample Weighting for Knowledge-Based Misinformation Detection on Social Media

Figure 3 for Efficient Annotator Reliability Assessment and Sample Weighting for Knowledge-Based Misinformation Detection on Social Media

Figure 4 for Efficient Annotator Reliability Assessment and Sample Weighting for Knowledge-Based Misinformation Detection on Social Media

Abstract:Misinformation spreads rapidly on social media, confusing the truth and targetting potentially vulnerable people. To effectively mitigate the negative impact of misinformation, it must first be accurately detected before applying a mitigation strategy, such as X's community notes, which is currently a manual process. This study takes a knowledge-based approach to misinformation detection, modelling the problem similarly to one of natural language inference. The EffiARA annotation framework is introduced, aiming to utilise inter- and intra-annotator agreement to understand the reliability of each annotator and influence the training of large language models for classification based on annotator reliability. In assessing the EffiARA annotation framework, the Russo-Ukrainian Conflict Knowledge-Based Misinformation Classification Dataset (RUC-MCD) was developed and made publicly available. This study finds that sample weighting using annotator reliability performs the best, utilising both inter- and intra-annotator agreement and soft-label training. The highest classification performance achieved using Llama-3.2-1B was a macro-F1 of 0.757 and 0.740 using TwHIN-BERT-large.

* 8 pages, 3 figures, 3 tables. Code available here: https://github.com/MiniEggz/ruc-misinfo

Via

Access Paper or Ask Questions

Enhancing Data Quality through Simple De-duplication: Navigating Responsible Computational Social Science Research

Oct 04, 2024

Yida Mu, Mali Jin, Xingyi Song, Nikolaos Aletras

Figure 1 for Enhancing Data Quality through Simple De-duplication: Navigating Responsible Computational Social Science Research

Figure 2 for Enhancing Data Quality through Simple De-duplication: Navigating Responsible Computational Social Science Research

Figure 3 for Enhancing Data Quality through Simple De-duplication: Navigating Responsible Computational Social Science Research

Figure 4 for Enhancing Data Quality through Simple De-duplication: Navigating Responsible Computational Social Science Research

Abstract:Research in natural language processing (NLP) for Computational Social Science (CSS) heavily relies on data from social media platforms. This data plays a crucial role in the development of models for analysing socio-linguistic phenomena within online communities. In this work, we conduct an in-depth examination of 20 datasets extensively used in NLP for CSS to comprehensively examine data quality. Our analysis reveals that social media datasets exhibit varying levels of data duplication. Consequently, this gives rise to challenges like label inconsistencies and data leakage, compromising the reliability of models. Our findings also suggest that data duplication has an impact on the current claims of state-of-the-art performance, potentially leading to an overestimation of model effectiveness in real-world scenarios. Finally, we propose new protocols and best practices for improving dataset development from social media data and its usage.

* Accepted at EMNLP 2024 Main

Via

Access Paper or Ask Questions

Cross-Modal Augmentation for Few-Shot Multimodal Fake News Detection

Jul 16, 2024

Ye Jiang, Taihang Wang, Xiaoman Xu, Yimin Wang, Xingyi Song, Diana Maynard

Figure 1 for Cross-Modal Augmentation for Few-Shot Multimodal Fake News Detection

Figure 2 for Cross-Modal Augmentation for Few-Shot Multimodal Fake News Detection

Figure 3 for Cross-Modal Augmentation for Few-Shot Multimodal Fake News Detection

Figure 4 for Cross-Modal Augmentation for Few-Shot Multimodal Fake News Detection

Abstract:The nascent topic of fake news requires automatic detection methods to quickly learn from limited annotated samples. Therefore, the capacity to rapidly acquire proficiency in a new task with limited guidance, also known as few-shot learning, is critical for detecting fake news in its early stages. Existing approaches either involve fine-tuning pre-trained language models which come with a large number of parameters, or training a complex neural network from scratch with large-scale annotated datasets. This paper presents a multimodal fake news detection model which augments multimodal features using unimodal features. For this purpose, we introduce Cross-Modal Augmentation (CMA), a simple approach for enhancing few-shot multimodal fake news detection by transforming n-shot classification into a more robust (n $\times$ z)-shot problem, where z represents the number of supplementary features. The proposed CMA achieves SOTA results over three benchmark datasets, utilizing a surprisingly simple linear probing method to classify multimodal fake news with only a few training samples. Furthermore, our method is significantly more lightweight than prior approaches, particularly in terms of the number of trainable parameters and epoch times. The code is available here: \url{https://github.com/zgjiangtoby/FND_fewshot}

Via

Access Paper or Ask Questions

Confidence Regulation Neurons in Language Models

Jun 24, 2024

Alessandro Stolfo, Ben Wu, Wes Gurnee, Yonatan Belinkov, Xingyi Song, Mrinmaya Sachan, Neel Nanda

Abstract:Despite their widespread use, the mechanisms by which large language models (LLMs) represent and regulate uncertainty in next-token predictions remain largely unexplored. This study investigates two critical components believed to influence this uncertainty: the recently discovered entropy neurons and a new set of components that we term token frequency neurons. Entropy neurons are characterized by an unusually high weight norm and influence the final layer normalization (LayerNorm) scale to effectively scale down the logits. Our work shows that entropy neurons operate by writing onto an unembedding null space, allowing them to impact the residual stream norm with minimal direct effect on the logits themselves. We observe the presence of entropy neurons across a range of models, up to 7 billion parameters. On the other hand, token frequency neurons, which we discover and describe here for the first time, boost or suppress each token's logit proportionally to its log frequency, thereby shifting the output distribution towards or away from the unigram distribution. Finally, we present a detailed case study where entropy neurons actively manage confidence in the setting of induction, i.e. detecting and continuing repeated subsequences.

* 25 pages, 14 figures

Via

Access Paper or Ask Questions

Identifying and Aligning Medical Claims Made on Social Media with Medical Evidence

May 18, 2024

Anthony Hughes, Xingyi Song

Figure 1 for Identifying and Aligning Medical Claims Made on Social Media with Medical Evidence

Figure 2 for Identifying and Aligning Medical Claims Made on Social Media with Medical Evidence

Figure 3 for Identifying and Aligning Medical Claims Made on Social Media with Medical Evidence

Figure 4 for Identifying and Aligning Medical Claims Made on Social Media with Medical Evidence

Abstract:Evidence-based medicine is the practice of making medical decisions that adhere to the latest, and best known evidence at that time. Currently, the best evidence is often found in the form of documents, such as randomized control trials, meta-analyses and systematic reviews. This research focuses on aligning medical claims made on social media platforms with this medical evidence. By doing so, individuals without medical expertise can more effectively assess the veracity of such medical claims. We study three core tasks: identifying medical claims, extracting medical vocabulary from these claims, and retrieving evidence relevant to those identified medical claims. We propose a novel system that can generate synthetic medical claims to aid each of these core tasks. We additionally introduce a novel dataset produced by our synthetic generator that, when applied to these tasks, demonstrates not only a more flexible and holistic approach, but also an improvement in all comparable metrics. We make our dataset, the Expansive Medical Claim Corpus (EMCC), available at https://zenodo.org/records/8321460

Via

Access Paper or Ask Questions

Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling

May 01, 2024

Yida Mu, Peizhen Bai, Kalina Bontcheva, Xingyi Song

Figure 1 for Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling

Figure 2 for Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling

Figure 3 for Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling

Figure 4 for Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling

Abstract:Large language models (LLMs) with their strong zero-shot topic extraction capabilities offer an alternative to probabilistic topic modelling and closed-set topic classification approaches. As zero-shot topic extractors, LLMs are expected to understand human instructions to generate relevant and non-hallucinated topics based on the given documents. However, LLM-based topic modelling approaches often face difficulties in generating topics with adherence to granularity as specified in human instructions, often resulting in many near-duplicate topics. Furthermore, methods for addressing hallucinated topics generated by LLMs have not yet been investigated. In this paper, we focus on addressing the issues of topic granularity and hallucinations for better LLM-based topic modelling. To this end, we introduce a novel approach that leverages Direct Preference Optimisation (DPO) to fine-tune open-source LLMs, such as Mistral-7B. Our approach does not rely on traditional human annotation to rank preferred answers but employs a reconstruction pipeline to modify raw topics generated by LLMs, thus enabling a fast and efficient training and inference framework. Comparative experiments show that our fine-tuning approach not only significantly improves the LLM's capability to produce more coherent, relevant, and precise topics, but also reduces the number of hallucinated topics.

Via

Access Paper or Ask Questions

Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling

Mar 26, 2024

Yida Mu, Chun Dong, Kalina Bontcheva, Xingyi Song

Abstract:Topic modelling, as a well-established unsupervised technique, has found extensive use in automatically detecting significant topics within a corpus of documents. However, classic topic modelling approaches (e.g., LDA) have certain drawbacks, such as the lack of semantic understanding and the presence of overlapping topics. In this work, we investigate the untapped potential of large language models (LLMs) as an alternative for uncovering the underlying topics within extensive text corpora. To this end, we introduce a framework that prompts LLMs to generate topics from a given set of documents and establish evaluation protocols to assess the clustering efficacy of LLMs. Our findings indicate that LLMs with appropriate prompts can stand out as a viable alternative, capable of generating relevant topic titles and adhering to human guidelines to refine and merge topics. Through in-depth experiments and evaluation, we summarise the advantages and constraints of employing LLMs in topic extraction.

* Accepted at LREC-COLING 2024

Via

Access Paper or Ask Questions

Don't Waste a Single Annotation: Improving Single-Label Classifiers Through Soft Labels

Nov 09, 2023

Ben Wu, Yue Li, Yida Mu, Carolina Scarton, Kalina Bontcheva, Xingyi Song

Figure 1 for Don't Waste a Single Annotation: Improving Single-Label Classifiers Through Soft Labels

Figure 2 for Don't Waste a Single Annotation: Improving Single-Label Classifiers Through Soft Labels

Figure 3 for Don't Waste a Single Annotation: Improving Single-Label Classifiers Through Soft Labels

Figure 4 for Don't Waste a Single Annotation: Improving Single-Label Classifiers Through Soft Labels

Abstract:In this paper, we address the limitations of the common data annotation and training methods for objective single-label classification tasks. Typically, when annotating such tasks annotators are only asked to provide a single label for each sample and annotator disagreement is discarded when a final hard label is decided through majority voting. We challenge this traditional approach, acknowledging that determining the appropriate label can be difficult due to the ambiguity and lack of context in the data samples. Rather than discarding the information from such ambiguous annotations, our soft label method makes use of them for training. Our findings indicate that additional annotator information, such as confidence, secondary label and disagreement, can be used to effectively generate soft labels. Training classifiers with these soft labels then leads to improved performance and calibration on the hard label test set.

* Accepted to EMNLP 2023 (Findings)

Via

Access Paper or Ask Questions