Alert button
Picture for Antske Fokkens

Antske Fokkens

Alert button

Improving and Evaluating the Detection of Fragmentation in News Recommendations with the Clustering of News Story Chains

Sep 18, 2023
Alessandra Polimeno, Myrthe Reuver, Sanne Vrijenhoek, Antske Fokkens

Figure 1 for Improving and Evaluating the Detection of Fragmentation in News Recommendations with the Clustering of News Story Chains
Figure 2 for Improving and Evaluating the Detection of Fragmentation in News Recommendations with the Clustering of News Story Chains
Figure 3 for Improving and Evaluating the Detection of Fragmentation in News Recommendations with the Clustering of News Story Chains

News recommender systems play an increasingly influential role in shaping information access within democratic societies. However, tailoring recommendations to users' specific interests can result in the divergence of information streams. Fragmented access to information poses challenges to the integrity of the public sphere, thereby influencing democracy and public discourse. The Fragmentation metric quantifies the degree of fragmentation of information streams in news recommendations. Accurate measurement of this metric requires the application of Natural Language Processing (NLP) to identify distinct news events, stories, or timelines. This paper presents an extensive investigation of various approaches for quantifying Fragmentation in news recommendations. These approaches are evaluated both intrinsically, by measuring performance on news story clustering, and extrinsically, by assessing the Fragmentation scores of different simulated news recommender scenarios. Our findings demonstrate that agglomerative hierarchical clustering coupled with SentenceBERT text representation is substantially better at detecting Fragmentation than earlier implementations. Additionally, the analysis of simulated scenarios yields valuable insights and recommendations for stakeholders concerning the measurement and interpretation of Fragmentation.

* NORMalize 2023: The First Workshop on the Normative Design and Evaluation of Recommender Systems, September 19, 2023, co-located with the ACM Conference on Recommender Systems 2023 (RecSys 2023), Singapore  
* Cite published version: Polimeno et. al., Improving and Evaluating the Detection of Fragmentation in News Recommendations with the Clustering of News Story Chains, NORMalize 2023: The First Workshop on the Normative Design and Evaluation of Recommender Systems, September 19, 2023, co-located with the ACM Conference on Recommender Systems 2023 (RecSys 2023), Singapore 
Viaarxiv icon

Better Hit the Nail on the Head than Beat around the Bush: Removing Protected Attributes with a Single Projection

Dec 08, 2022
Pantea Haghighatkhah, Antske Fokkens, Pia Sommerauer, Bettina Speckmann, Kevin Verbeek

Figure 1 for Better Hit the Nail on the Head than Beat around the Bush: Removing Protected Attributes with a Single Projection
Figure 2 for Better Hit the Nail on the Head than Beat around the Bush: Removing Protected Attributes with a Single Projection
Figure 3 for Better Hit the Nail on the Head than Beat around the Bush: Removing Protected Attributes with a Single Projection
Figure 4 for Better Hit the Nail on the Head than Beat around the Bush: Removing Protected Attributes with a Single Projection

Bias elimination and recent probing studies attempt to remove specific information from embedding spaces. Here it is important to remove as much of the target information as possible, while preserving any other information present. INLP is a popular recent method which removes specific information through iterative nullspace projections. Multiple iterations, however, increase the risk that information other than the target is negatively affected. We introduce two methods that find a single targeted projection: Mean Projection (MP, more efficient) and Tukey Median Projection (TMP, with theoretical guarantees). Our comparison between MP and INLP shows that (1) one MP projection removes linear separability based on the target and (2) MP has less impact on the overall space. Further analysis shows that applying random projections after MP leads to the same overall effects on the embedding space as the multiple projections of INLP. Applying one targeted (MP) projection hence is methodologically cleaner than applying multiple (INLP) projections that introduce random effects.

* https://aclanthology.org/2022.emnlp-main.575  
* EMNLP 2022 
Viaarxiv icon

Dealing with Abbreviations in the Slovenian Biographical Lexicon

Nov 04, 2022
Angel Daza, Antske Fokkens, Tomaž Erjavec

Figure 1 for Dealing with Abbreviations in the Slovenian Biographical Lexicon
Figure 2 for Dealing with Abbreviations in the Slovenian Biographical Lexicon
Figure 3 for Dealing with Abbreviations in the Slovenian Biographical Lexicon
Figure 4 for Dealing with Abbreviations in the Slovenian Biographical Lexicon

Abbreviations present a significant challenge for NLP systems because they cause tokenization and out-of-vocabulary errors. They can also make the text less readable, especially in reference printed books, where they are extensively used. Abbreviations are especially problematic in low-resource settings, where systems are less robust to begin with. In this paper, we propose a new method for addressing the problems caused by a high density of domain-specific abbreviations in a text. We apply this method to the case of a Slovenian biographical lexicon and evaluate it on a newly developed gold-standard dataset of 51 Slovenian biographies. Our abbreviation identification method performs significantly better than commonly used ad-hoc solutions, especially at identifying unseen abbreviations. We also propose and present the results of a method for expanding the identified abbreviations in context.

* To be presented at The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022) 
Viaarxiv icon

Perturbations and Subpopulations for Testing Robustness in Token-Based Argument Unit Recognition

Sep 29, 2022
Jonathan Kamp, Lisa Beinborn, Antske Fokkens

Figure 1 for Perturbations and Subpopulations for Testing Robustness in Token-Based Argument Unit Recognition
Figure 2 for Perturbations and Subpopulations for Testing Robustness in Token-Based Argument Unit Recognition
Figure 3 for Perturbations and Subpopulations for Testing Robustness in Token-Based Argument Unit Recognition
Figure 4 for Perturbations and Subpopulations for Testing Robustness in Token-Based Argument Unit Recognition

Argument Unit Recognition and Classification aims at identifying argument units from text and classifying them as pro or against. One of the design choices that need to be made when developing systems for this task is what the unit of classification should be: segments of tokens or full sentences. Previous research suggests that fine-tuning language models on the token-level yields more robust results for classifying sentences compared to training on sentences directly. We reproduce the study that originally made this claim and further investigate what exactly token-based systems learned better compared to sentence-based ones. We develop systematic tests for analysing the behavioural differences between the token-based and the sentence-based system. Our results show that token-based models are generally more robust than sentence-based models both on manually perturbed examples and on specific subpopulations of the data.

* Accepted at the 9th Workshop on Argument Mining, co-located with COLING 2022. Please cite the published version when available 
Viaarxiv icon

Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions

Jun 30, 2022
Urja Khurana, Ivar Vermeulen, Eric Nalisnick, Marloes van Noorloos, Antske Fokkens

Figure 1 for Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions
Figure 2 for Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions
Figure 3 for Hate Speech Criteria: A Modular Approach to Task-Specific Hate Speech Definitions

\textbf{Offensive Content Warning}: This paper contains offensive language only for providing examples that clarify this research and do not reflect the authors' opinions. Please be aware that these examples are offensive and may cause you distress. The subjectivity of recognizing \textit{hate speech} makes it a complex task. This is also reflected by different and incomplete definitions in NLP. We present \textit{hate speech} criteria, developed with perspectives from law and social science, with the aim of helping researchers create more precise definitions and annotation guidelines on five aspects: (1) target groups, (2) dominance, (3) perpetrator characteristics, (4) type of negative group reference, and the (5) type of potential consequences/effects. Definitions can be structured so that they cover a more broad or more narrow phenomenon. As such, conscious choices can be made on specifying criteria or leaving them open. We argue that the goal and exact task developers have in mind should determine how the scope of \textit{hate speech} is defined. We provide an overview of the properties of English datasets from \url{hatespeechdata.com} that may help select the most suitable dataset for a specific scenario.

* Accepted at WOAH 2022, co-located at NAACL 2022. Cite ACL version 
Viaarxiv icon

How Emotionally Stable is ALBERT? Testing Robustness with Stochastic Weight Averaging on a Sentiment Analysis Task

Nov 18, 2021
Urja Khurana, Eric Nalisnick, Antske Fokkens

Figure 1 for How Emotionally Stable is ALBERT? Testing Robustness with Stochastic Weight Averaging on a Sentiment Analysis Task
Figure 2 for How Emotionally Stable is ALBERT? Testing Robustness with Stochastic Weight Averaging on a Sentiment Analysis Task
Figure 3 for How Emotionally Stable is ALBERT? Testing Robustness with Stochastic Weight Averaging on a Sentiment Analysis Task
Figure 4 for How Emotionally Stable is ALBERT? Testing Robustness with Stochastic Weight Averaging on a Sentiment Analysis Task

Despite their success, modern language models are fragile. Even small changes in their training pipeline can lead to unexpected results. We study this phenomenon by examining the robustness of ALBERT (arXiv:1909.11942) in combination with Stochastic Weight Averaging (SWA) (arXiv:1803.05407) -- a cheap way of ensembling -- on a sentiment analysis task (SST-2). In particular, we analyze SWA's stability via CheckList criteria (arXiv:2005.04118), examining the agreement on errors made by models differing only in their random seed. We hypothesize that SWA is more stable because it ensembles model snapshots taken along the gradient descent trajectory. We quantify stability by comparing the models' mistakes with Fleiss' Kappa (Fleiss, 1971) and overlap ratio scores. We find that SWA reduces error rates in general; yet the models still suffer from their own distinct biases (according to CheckList).

* Accepted at the second workshop on Evaluation & Comparison of NLP Systems, co-located at EMNLP 2021. Cite ACL version 
Viaarxiv icon

Is Stance Detection Topic-Independent and Cross-topic Generalizable? -- A Reproduction Study

Oct 14, 2021
Myrthe Reuver, Suzan Verberne, Roser Morante, Antske Fokkens

Figure 1 for Is Stance Detection Topic-Independent and Cross-topic Generalizable? -- A Reproduction Study
Figure 2 for Is Stance Detection Topic-Independent and Cross-topic Generalizable? -- A Reproduction Study
Figure 3 for Is Stance Detection Topic-Independent and Cross-topic Generalizable? -- A Reproduction Study
Figure 4 for Is Stance Detection Topic-Independent and Cross-topic Generalizable? -- A Reproduction Study

Cross-topic stance detection is the task to automatically detect stances (pro, against, or neutral) on unseen topics. We successfully reproduce state-of-the-art cross-topic stance detection work (Reimers et. al., 2019), and systematically analyze its reproducibility. Our attention then turns to the cross-topic aspect of this work, and the specificity of topics in terms of vocabulary and socio-cultural context. We ask: To what extent is stance detection topic-independent and generalizable across topics? We compare the model's performance on various unseen topics, and find topic (e.g. abortion, cloning), class (e.g. pro, con), and their interaction affecting the model's performance. We conclude that investigating performance on different topics, and addressing topic-specific vocabulary and context, is a future avenue for cross-topic stance detection.

* Accepted at the 8th Workshop on Argument Mining, 2021 co-located with EMNLP 2021. Cite the published version 
Viaarxiv icon

Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell

Sep 05, 2018
Pia Sommerauer, Antske Fokkens

Figure 1 for Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell
Figure 2 for Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell
Figure 3 for Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell
Figure 4 for Firearms and Tigers are Dangerous, Kitchen Knives and Zebras are Not: Testing whether Word Embeddings Can Tell

This paper presents an approach for investigating the nature of semantic information captured by word embeddings. We propose a method that extends an existing human-elicited semantic property dataset with gold negative examples using crowd judgments. Our experimental approach tests the ability of supervised classifiers to identify semantic features in word embedding vectors and com- pares this to a feature-identification method based on full vector cosine similarity. The idea behind this method is that properties identified by classifiers, but not through full vector comparison are captured by embeddings. Properties that cannot be identified by either method are not. Our results provide an initial indication that semantic properties relevant for the way entities interact (e.g. dangerous) are captured, while perceptual information (e.g. colors) is not represented. We conclude that, though preliminary, these results show that our method is suitable for identifying which properties are captured by embeddings.

* Accepted to the EMNLP workshop "Analyzing and interpreting neural networks for NLP" 
Viaarxiv icon

BiographyNet: Extracting Relations Between People and Events

Jan 22, 2018
Antske Fokkens, Serge ter Braake, Niels Ockeloen, Piek Vossen, Susan Legêne, Guus Schreiber, Victor de Boer

Figure 1 for BiographyNet: Extracting Relations Between People and Events
Figure 2 for BiographyNet: Extracting Relations Between People and Events
Figure 3 for BiographyNet: Extracting Relations Between People and Events
Figure 4 for BiographyNet: Extracting Relations Between People and Events

This paper describes BiographyNet, a digital humanities project (2012-2016) that brings together researchers from history, computational linguistics and computer science. The project uses data from the Biography Portal of the Netherlands (BPN), which contains approximately 125,000 biographies from a variety of Dutch biographical dictionaries from the eighteenth century until now, describing around 76,000 individuals. BiographyNet's aim is to strengthen the value of the portal and comparable biographical datasets for historical research, by improving the search options and the presentation of its outcome, with a historically justified NLP pipeline that works through a user evaluated demonstrator. The project's main target group are professional historians. The project therefore worked with two key concepts: ``provenance'' -understood as a term allowing for both historical source criticism and for references to data-management and programming interventions in digitized sources; and ``perspective'' interpreted as inherent uncertainty concerning the interpretation of historical results.

* 32 pages, 5 figures, \'A. Z. Bern\'ad, C. Gruber, M. Kaiser (editors). Europa baut auf Biographien: Aspekte, Bausteine, Normen und Standards f\"ur eine europ\"aische Biographik (2017) 
Viaarxiv icon