Abstract:This work proposes a contextualised detection framework for implicitly hateful speech, implemented as a multi-agent system comprising a central Moderator Agent and dynamically constructed Community Agents representing specific demographic groups. Our approach explicitly integrates socio-cultural context from publicly available knowledge sources, enabling identity-aware moderation that surpasses state-of-the-art prompting methods (zero-shot prompting, few-shot prompting, chain-of-thought prompting) and alternative approaches on a challenging ToxiGen dataset. We enhance the technical rigour of performance evaluation by incorporating balanced accuracy as a central metric of classification fairness that accounts for the trade-off between true positive and true negative rates. We demonstrate that our community-driven consultative framework significantly improves both classification accuracy and fairness across all target groups.




Abstract:In this paper, we investigate how personalising Large Language Models (Persona-LLMs) with annotator personas affects their sensitivity to hate speech, particularly regarding biases linked to shared or differing identities between annotators and targets. To this end, we employ Google's Gemini and OpenAI's GPT-4.1-mini models and two persona-prompting methods: shallow persona prompting and a deeply contextualised persona development based on Retrieval-Augmented Generation (RAG) to incorporate richer persona profiles. We analyse the impact of using in-group and out-group annotator personas on the models' detection performance and fairness across diverse social groups. This work bridges psychological insights on group identity with advanced NLP techniques, demonstrating that incorporating socio-demographic attributes into LLMs can address bias in automated hate speech detection. Our results highlight both the potential and limitations of persona-based approaches in reducing bias, offering valuable insights for developing more equitable hate speech detection systems.




Abstract:Growing polarisation in society caught the attention of the scientific community as well as news media, which devote special issues to this phenomenon. At the same time, digitalisation of social interactions requires to revise concepts from social science regarding establishment of trust, which is a key feature of all human interactions, and group polarisation, as well as new computational tools to process large quantities of available data. Existing methods seem insufficient to tackle the problem fully, thus, we propose to approach the problem by investigating rhetorical strategies employed by individuals in polarising discussions online. To this end, we develop multi-topic and multi-platform corpora with manual annotation of appeals to ethos and pathos, two modes of persuasion in Aristotelian rhetoric. It can be employed for training language models to advance the study of communication strategies online on a large scale. With the use of computational methods, our corpora allows an investigation of recurring patterns in polarising exchanges across topics of discussion and media platforms, and conduct both quantitative and qualitative analyses of language structures leading to and engaged in polarisation.