Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Elisabeth Steffen

Civil Society in the Loop: Feedback-Driven Adaptation of (L)LM-Assisted Classification in an Open-Source Telegram Monitoring Tool

Jul 09, 2025

Milena Pustet, Elisabeth Steffen, Helena Mihaljević, Grischa Stanjek, Yannis Illies

Abstract:The role of civil society organizations (CSOs) in monitoring harmful online content is increasingly crucial, especially as platform providers reduce their investment in content moderation. AI tools can assist in detecting and monitoring harmful content at scale. However, few open-source tools offer seamless integration of AI models and social media monitoring infrastructures. Given their thematic expertise and contextual understanding of harmful content, CSOs should be active partners in co-developing technological tools, providing feedback, helping to improve models, and ensuring alignment with stakeholder needs and values, rather than as passive 'consumers'. However, collaborations between the open source community, academia, and civil society remain rare, and research on harmful content seldom translates into practical tools usable by civil society actors. This work in progress explores how CSOs can be meaningfully involved in an AI-assisted open-source monitoring tool of anti-democratic movements on Telegram, which we are currently developing in collaboration with CSO stakeholders.

Via

Access Paper or Ask Questions

More than Memes: A Multimodal Topic Modeling Approach to Conspiracy Theories on Telegram

Oct 11, 2024

Elisabeth Steffen

Abstract:Research on conspiracy theories and related content online has traditionally focused on textual data. To address the increasing prevalence of (audio-)visual data on social media, and to capture the evolving and dynamic nature of this communication, researchers have begun to explore the potential of unsupervised approaches for analyzing multimodal online content. Our research contributes to this field by exploring the potential of multimodal topic modeling for analyzing conspiracy theories in German-language Telegram channels. Our work uses the BERTopic topic modeling approach in combination with CLIP for the analysis of textual and visual data. We analyze a corpus of ~40, 000 Telegram messages posted in October 2023 in 571 German-language Telegram channels known for disseminating conspiracy theories and other deceptive content. We explore the potentials and challenges of this approach for studying a medium-sized corpus of user-generated, text-image online content. We offer insights into the dominant topics across modalities, different text and image genres discovered during the analysis, quantitative inter-modal topic analyses, and a qualitative case study of textual, visual, and multimodal narrative strategies in the communication of conspiracy theories.

* 11 pages, 11 figures

Via

Access Paper or Ask Questions

Detection of Conspiracy Theories Beyond Keyword Bias in German-Language Telegram Using Large Language Models

Apr 27, 2024

Milena Pustet, Elisabeth Steffen, Helena Mihaljević

Abstract:The automated detection of conspiracy theories online typically relies on supervised learning. However, creating respective training data requires expertise, time and mental resilience, given the often harmful content. Moreover, available datasets are predominantly in English and often keyword-based, introducing a token-level bias into the models. Our work addresses the task of detecting conspiracy theories in German Telegram messages. We compare the performance of supervised fine-tuning approaches using BERT-like models with prompt-based approaches using Llama2, GPT-3.5, and GPT-4 which require little or no additional training data. We use a dataset of $\sim\!\! 4,000$ messages collected during the COVID-19 pandemic, without the use of keyword filters. Our findings demonstrate that both approaches can be leveraged effectively: For supervised fine-tuning, we report an F1 score of $\sim\!\! 0.8$ for the positive class, making our model comparable to recent models trained on keyword-focused English corpora. We demonstrate our model's adaptability to intra-domain temporal shifts, achieving F1 scores of $\sim\!\! 0.7$. Among prompting variants, the best model is GPT-4, achieving an F1 score of $\sim\!\! 0.8$ for the positive class in a zero-shot setting and equipped with a custom conspiracy theory definition.

* Accepted to the 8th Workshop on Online Abuse and Harms (WOAH), ACL 2024

Via

Access Paper or Ask Questions

How toxic is antisemitism? Potentials and limitations of automated toxicity scoring for antisemitic online content

Oct 05, 2023

Helena Mihaljević, Elisabeth Steffen

Abstract:The Perspective API, a popular text toxicity assessment service by Google and Jigsaw, has found wide adoption in several application areas, notably content moderation, monitoring, and social media research. We examine its potentials and limitations for the detection of antisemitic online content that, by definition, falls under the toxicity umbrella term. Using a manually annotated German-language dataset comprising around 3,600 posts from Telegram and Twitter, we explore as how toxic antisemitic texts are rated and how the toxicity scores differ regarding different subforms of antisemitism and the stance expressed in the texts. We show that, on a basic level, Perspective API recognizes antisemitic content as toxic, but shows critical weaknesses with respect to non-explicit forms of antisemitism and texts taking a critical stance towards it. Furthermore, using simple text manipulations, we demonstrate that the use of widespread antisemitic codes can substantially reduce API scores, making it rather easy to bypass content moderation based on the service's results.

* In: Proceedings of the 2nd Workshop on Computational Linguistics for Political Text Analysis (CPSS-2022), Potsdam, Germany, Sep 12, 2022

Via

Access Paper or Ask Questions

Codes, Patterns and Shapes of Contemporary Online Antisemitism and Conspiracy Narratives -- an Annotation Guide and Labeled German-Language Dataset in the Context of COVID-19

Oct 13, 2022

Elisabeth Steffen, Helena Mihaljević, Milena Pustet, Nyco Bischoff, María do Mar Castro Varela, Yener Bayramoğlu, Bahar Oghalai

Figure 1 for Codes, Patterns and Shapes of Contemporary Online Antisemitism and Conspiracy Narratives -- an Annotation Guide and Labeled German-Language Dataset in the Context of COVID-19

Abstract:Over the course of the COVID-19 pandemic, existing conspiracy theories were refreshed and new ones were created, often interwoven with antisemitic narratives, stereotypes and codes. The sheer volume of antisemitic and conspiracy theory content on the Internet makes data-driven algorithmic approaches essential for anti-discrimination organizations and researchers alike. However, the manifestation and dissemination of these two interrelated phenomena is still quite under-researched in scholarly empirical research of large text corpora. Algorithmic approaches for the detection and classification of specific contents usually require labeled datasets, annotated based on conceptually sound guidelines. While there is a growing number of datasets for the more general phenomenon of hate speech, the development of corpora and annotation guidelines for antisemitic and conspiracy content is still in its infancy, especially for languages other than English. We contribute to closing this gap by developing an annotation guide for antisemitic and conspiracy theory online content in the context of the COVID-19 pandemic. We provide working definitions, including specific forms of antisemitism such as encoded and post-Holocaust antisemitism. We use these to annotate a German-language dataset consisting of ~3,700 Telegram messages sent between 03/2020 and 12/2021.

* Link to the data sheet of the dataset: https://doi.org/10.5281/zenodo.6412114

Via

Access Paper or Ask Questions