Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aleksandra Edwards

SemEval-2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultures

May 04, 2026

Nedjma Ousidhoum, Junho Myung, Carla Perez-Almendros, Jiho Jin, Amr Keleg, Meriem Beloucif, Yi Zhou, Rodrigo Agerri, Vladimir Araujo, Naomi Baes(+20 more)

Abstract:We present our shared task on evaluating the adaptability of LLMs and NLP systems across multiple languages and cultures. The task data consist of an extended version of our manually constructed BLEnD benchmark (Myung et al. 2024), covering more than 30 language-culture pairs, predominantly representing low-resource languages spoken across multiple continents. As the task is designed strictly for evaluation, participants were not permitted to use the data for training, fine-tuning, few-shot learning, or any other form of model modification. Our task includes two tracks: (a) Short-Answer Questions (SAQ) and (b) Multiple-Choice Questions (MCQ). Participants were required to predict labels and were allowed to submit any NLP system and adopt diverse modelling strategies, provided that the benchmark was used solely for evaluation. The task attracted more than 140 registered participants, and we received final submissions from 62 teams, along with 19 system description papers. We report the results and present an analysis of the best-performing systems and the most commonly adopted approaches. Furthermore, we discuss shared insights into open questions and challenges related to evaluation, misalignment, and methodological perspectives on model behaviour in low-resource languages and for under-represented cultures.

* SemEval-2026 Task Description Paper. Data and resources are available at \url{https://github.com/BLEnD-SemEval2026/SemEval-2026-Task-7

Via

Access Paper or Ask Questions

Language Models for Text Classification: Is In-Context Learning Enough?

Mar 26, 2024

Aleksandra Edwards, Jose Camacho-Collados

Figure 1 for Language Models for Text Classification: Is In-Context Learning Enough?

Figure 2 for Language Models for Text Classification: Is In-Context Learning Enough?

Figure 3 for Language Models for Text Classification: Is In-Context Learning Enough?

Figure 4 for Language Models for Text Classification: Is In-Context Learning Enough?

Abstract:Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings. An advantage of these models over more standard approaches based on fine-tuning is the ability to understand instructions written in natural language (prompts), which helps them generalise better to different tasks and domains without the need for specific training data. This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances. However, existing research is limited in scale and lacks understanding of how text generation models combined with prompting techniques compare to more established methods for text classification such as fine-tuning masked language models. In this paper, we address this research gap by performing a large-scale evaluation study for 16 text classification datasets covering binary, multiclass, and multilabel problems. In particular, we compare zero- and few-shot approaches of large language models to fine-tuning smaller language models. We also analyse the results by prompt, classification type, domain, and number of labels. In general, the results show how fine-tuning smaller and more efficient language models can still outperform few-shot approaches of larger language models, which have room for improvement when it comes to text classification.

* Accepted at LREC-COLING 2024

Via

Access Paper or Ask Questions

Guiding Generative Language Models for Data Augmentation in Few-Shot Text Classification

Nov 17, 2021

Aleksandra Edwards, Asahi Ushio, Jose Camacho-Collados, Hélène de Ribaupierre, Alun Preece

Figure 1 for Guiding Generative Language Models for Data Augmentation in Few-Shot Text Classification

Figure 2 for Guiding Generative Language Models for Data Augmentation in Few-Shot Text Classification

Figure 3 for Guiding Generative Language Models for Data Augmentation in Few-Shot Text Classification

Figure 4 for Guiding Generative Language Models for Data Augmentation in Few-Shot Text Classification

Abstract:Data augmentation techniques are widely used for enhancing the performance of machine learning models by tackling class imbalance issues and data sparsity. State-of-the-art generative language models have been shown to provide significant gains across different NLP tasks. However, their applicability to data augmentation for text classification tasks in few-shot settings have not been fully explored, especially for specialised domains. In this paper, we leverage GPT-2 (Radford A et al, 2019) for generating artificial training instances in order to improve classification performance. Our aim is to analyse the impact the selection process of seed training examples have over the quality of GPT-generated samples and consequently the classifier performance. We perform experiments with several seed selection strategies that, among others, exploit class hierarchical structures and domain expert selection. Our results show that fine-tuning GPT-2 in a handful of label instances leads to consistent classification improvements and outperform competitive baselines. Finally, we show that guiding this process through domain expert selection can lead to further improvements, which opens up interesting research avenues for combining generative models and active learning.

* 14 pages, 4 figures, 10 tables

Via

Access Paper or Ask Questions

Predicting Themes within Complex Unstructured Texts: A Case Study on Safeguarding Reports

Oct 29, 2020

Aleksandra Edwards, David Rogers, Jose Camacho-Collados, Hélène de Ribaupierre, Alun Preece

Figure 1 for Predicting Themes within Complex Unstructured Texts: A Case Study on Safeguarding Reports

Figure 2 for Predicting Themes within Complex Unstructured Texts: A Case Study on Safeguarding Reports

Figure 3 for Predicting Themes within Complex Unstructured Texts: A Case Study on Safeguarding Reports

Figure 4 for Predicting Themes within Complex Unstructured Texts: A Case Study on Safeguarding Reports

Abstract:The task of text and sentence classification is associated with the need for large amounts of labelled training data. The acquisition of high volumes of labelled datasets can be expensive or unfeasible, especially for highly-specialised domains for which documents are hard to obtain. Research on the application of supervised classification based on small amounts of training data is limited. In this paper, we address the combination of state-of-the-art deep learning and classification methods and provide an insight into what combination of methods fit the needs of small, domain-specific, and terminologically-rich corpora. We focus on a real-world scenario related to a collection of safeguarding reports comprising learning experiences and reflections on tackling serious incidents involving children and vulnerable adults. The relatively small volume of available reports and their use of highly domain-specific terminology makes the application of automated approaches difficult. We focus on the problem of automatically identifying the main themes in a safeguarding report using supervised classification approaches. Our results show the potential of deep learning models to simulate subject-expert behaviour even for complex tasks with limited labelled data.

* 10 pages, 5 figures, workshop

Via

Access Paper or Ask Questions