Alert button
Picture for Lea Frermann

Lea Frermann

Alert button

Connecting the Dots in News Analysis: A Cross-Disciplinary Survey of Media Bias and Framing

Sep 14, 2023
Gisela Vallejo, Timothy Baldwin, Lea Frermann

The manifestation and effect of bias in news reporting have been central topics in the social sciences for decades, and have received increasing attention in the NLP community recently. While NLP can help to scale up analyses or contribute automatic procedures to investigate the impact of biased news in society, we argue that methodologies that are currently dominant fall short of addressing the complex questions and effects addressed in theoretical media studies. In this survey paper, we review social science approaches and draw a comparison with typical task formulations, methods, and evaluation metrics used in the analysis of media bias in NLP. We discuss open questions and suggest possible directions to close identified gaps between theory and predictive models, and their evaluation. These include model transparency, considering document-external information, and cross-document reasoning rather than single-label assignment.

Viaarxiv icon

Conflicts, Villains, Resolutions: Towards models of Narrative Media Framing

Jun 03, 2023
Lea Frermann, Jiatong Li, Shima Khanehzar, Gosia Mikolajczak

Figure 1 for Conflicts, Villains, Resolutions: Towards models of Narrative Media Framing
Figure 2 for Conflicts, Villains, Resolutions: Towards models of Narrative Media Framing
Figure 3 for Conflicts, Villains, Resolutions: Towards models of Narrative Media Framing
Figure 4 for Conflicts, Villains, Resolutions: Towards models of Narrative Media Framing

Despite increasing interest in the automatic detection of media frames in NLP, the problem is typically simplified as single-label classification and adopts a topic-like view on frames, evading modelling the broader document-level narrative. In this work, we revisit a widely used conceptualization of framing from the communication sciences which explicitly captures elements of narratives, including conflict and its resolution, and integrate it with the narrative framing of key entities in the story as heroes, victims or villains. We adapt an effective annotation paradigm that breaks a complex annotation task into a series of simpler binary questions, and present an annotated data set of English news articles, and a case study on the framing of climate change in articles from news outlets across the political spectrum. Finally, we explore automatic multi-label prediction of our frames with supervised and semi-supervised approaches, and present a novel retrieval-based method which is both effective and transparent in its predictions. We conclude with a discussion of opportunities and challenges for future work on document-level models of narrative framing.

* To appear in ACL 2023 (main conference) 
Viaarxiv icon

A Large-Scale Multilingual Study of Visual Constraints on Linguistic Selection of Descriptions

Feb 09, 2023
Uri Berger, Lea Frermann, Gabriel Stanovsky, Omri Abend

Figure 1 for A Large-Scale Multilingual Study of Visual Constraints on Linguistic Selection of Descriptions
Figure 2 for A Large-Scale Multilingual Study of Visual Constraints on Linguistic Selection of Descriptions
Figure 3 for A Large-Scale Multilingual Study of Visual Constraints on Linguistic Selection of Descriptions
Figure 4 for A Large-Scale Multilingual Study of Visual Constraints on Linguistic Selection of Descriptions

We present a large, multilingual study into how vision constrains linguistic choice, covering four languages and five linguistic properties, such as verb transitivity or use of numerals. We propose a novel method that leverages existing corpora of images with captions written by native speakers, and apply it to nine corpora, comprising 600k images and 3M captions. We study the relation between visual input and linguistic choices by training classifiers to predict the probability of expressing a property from raw images, and find evidence supporting the claim that linguistic properties are constrained by visual context across languages. We complement this investigation with a corpus study, taking the test case of numerals. Specifically, we use existing annotations (number or type of objects) to investigate the effect of different visual conditions on the use of numeral expressions in captions, and show that similar patterns emerge across languages. Our methods and findings both confirm and extend existing research in the cognitive literature. We additionally discuss possible applications for language generation.

* Accepted to EACL 2023 Findings 
Viaarxiv icon

Professional Presentation and Projected Power: A Case Study of Implicit Gender Information in English CVs

Nov 17, 2022
Jinrui Yang, Sheilla Njoto, Marc Cheong, Leah Ruppanner, Lea Frermann

Figure 1 for Professional Presentation and Projected Power: A Case Study of Implicit Gender Information in English CVs
Figure 2 for Professional Presentation and Projected Power: A Case Study of Implicit Gender Information in English CVs
Figure 3 for Professional Presentation and Projected Power: A Case Study of Implicit Gender Information in English CVs
Figure 4 for Professional Presentation and Projected Power: A Case Study of Implicit Gender Information in English CVs

Gender discrimination in hiring is a pertinent and persistent bias in society, and a common motivating example for exploring bias in NLP. However, the manifestation of gendered language in application materials has received limited attention. This paper investigates the framing of skills and background in CVs of self-identified men and women. We introduce a data set of 1.8K authentic, English-language, CVs from the US, covering 16 occupations, allowing us to partially control for the confound occupation-specific gender base rates. We find that (1) women use more verbs evoking impressions of low power; and (2) classifiers capture gender signal even after data balancing and removal of pronouns and named entities, and this holds for both transformer-based and linear classifiers.

* Accepted at the NLP+CSS 2022 workshop (co-located with EMNLP) 
Viaarxiv icon

Systematic Evaluation of Predictive Fairness

Oct 17, 2022
Xudong Han, Aili Shen, Trevor Cohn, Timothy Baldwin, Lea Frermann

Figure 1 for Systematic Evaluation of Predictive Fairness
Figure 2 for Systematic Evaluation of Predictive Fairness
Figure 3 for Systematic Evaluation of Predictive Fairness
Figure 4 for Systematic Evaluation of Predictive Fairness

Mitigating bias in training on biased datasets is an important open problem. Several techniques have been proposed, however the typical evaluation regime is very limited, considering very narrow data conditions. For instance, the effect of target class imbalance and stereotyping is under-studied. To address this gap, we examine the performance of various debiasing methods across multiple tasks, spanning binary classification (Twitter sentiment), multi-class classification (profession prediction), and regression (valence prediction). Through extensive experimentation, we find that data conditions have a strong influence on relative model performance, and that general conclusions cannot be drawn about method efficacy when evaluating only on standard datasets, as is current practice in fairness research.

* AACL 2022 
Viaarxiv icon

A Computational Acquisition Model for Multimodal Word Categorization

May 12, 2022
Uri Berger, Gabriel Stanovsky, Omri Abend, Lea Frermann

Figure 1 for A Computational Acquisition Model for Multimodal Word Categorization
Figure 2 for A Computational Acquisition Model for Multimodal Word Categorization
Figure 3 for A Computational Acquisition Model for Multimodal Word Categorization
Figure 4 for A Computational Acquisition Model for Multimodal Word Categorization

Recent advances in self-supervised modeling of text and images open new opportunities for computational models of child language acquisition, which is believed to rely heavily on cross-modal signals. However, prior studies have been limited by their reliance on vision models trained on large image datasets annotated with a pre-defined set of depicted object categories. This is (a) not faithful to the information children receive and (b) prohibits the evaluation of such models with respect to category learning tasks, due to the pre-imposed category structure. We address this gap, and present a cognitively-inspired, multimodal acquisition model, trained from image-caption pairs on naturalistic data using cross-modal self-supervision. We show that the model learns word categories and object recognition abilities, and presents trends reminiscent of those reported in the developmental literature. We make our code and trained models public for future reference and use.

* Accepted to NAACL 2022 
Viaarxiv icon

Optimising Equal Opportunity Fairness in Model Training

May 05, 2022
Aili Shen, Xudong Han, Trevor Cohn, Timothy Baldwin, Lea Frermann

Figure 1 for Optimising Equal Opportunity Fairness in Model Training
Figure 2 for Optimising Equal Opportunity Fairness in Model Training
Figure 3 for Optimising Equal Opportunity Fairness in Model Training
Figure 4 for Optimising Equal Opportunity Fairness in Model Training

Real-world datasets often encode stereotypes and societal biases. Such biases can be implicitly captured by trained models, leading to biased predictions and exacerbating existing societal preconceptions. Existing debiasing methods, such as adversarial training and removing protected information from representations, have been shown to reduce bias. However, a disconnect between fairness criteria and training objectives makes it difficult to reason theoretically about the effectiveness of different techniques. In this work, we propose two novel training objectives which directly optimise for the widely-used criterion of {\it equal opportunity}, and show that they are effective in reducing bias while maintaining high performance over two classification tasks.

* Accepted to NAACL 2022 main conference 
Viaarxiv icon

fairlib: A Unified Framework for Assessing and Improving Classification Fairness

May 04, 2022
Xudong Han, Aili Shen, Yitong Li, Lea Frermann, Timothy Baldwin, Trevor Cohn

Figure 1 for fairlib: A Unified Framework for Assessing and Improving Classification Fairness
Figure 2 for fairlib: A Unified Framework for Assessing and Improving Classification Fairness
Figure 3 for fairlib: A Unified Framework for Assessing and Improving Classification Fairness
Figure 4 for fairlib: A Unified Framework for Assessing and Improving Classification Fairness

This paper presents fairlib, an open-source framework for assessing and improving classification fairness. It provides a systematic framework for quickly reproducing existing baseline models, developing new methods, evaluating models with different metrics, and visualizing their results. Its modularity and extensibility enable the framework to be used for diverse types of inputs, including natural language, images, and audio. In detail, we implement 14 debiasing methods, including pre-processing, at-training-time, and post-processing approaches. The built-in metrics cover the most commonly used fairness criterion and can be further generalized and customized for fairness evaluation.

* pre-print, 9 pages 
Viaarxiv icon

Unsupervised Cross-Lingual Transfer of Structured Predictors without Source Data

Oct 08, 2021
Kemal Kurniawan, Lea Frermann, Philip Schulz, Trevor Cohn

Figure 1 for Unsupervised Cross-Lingual Transfer of Structured Predictors without Source Data
Figure 2 for Unsupervised Cross-Lingual Transfer of Structured Predictors without Source Data
Figure 3 for Unsupervised Cross-Lingual Transfer of Structured Predictors without Source Data
Figure 4 for Unsupervised Cross-Lingual Transfer of Structured Predictors without Source Data

Providing technologies to communities or domains where training data is scarce or protected e.g., for privacy reasons, is becoming increasingly important. To that end, we generalise methods for unsupervised transfer from multiple input models for structured prediction. We show that the means of aggregating over the input models is critical, and that multiplying marginal probabilities of substructures to obtain high-probability structures for distant supervision is substantially better than taking the union of such structures over the input models, as done in prior work. Testing on 18 languages, we demonstrate that the method works in a cross-lingual setting, considering both dependency parsing and part-of-speech structured prediction problems. Our analyses show that the proposed method produces less noisy labels for the distant supervision.

Viaarxiv icon

Contrastive Learning for Fair Representations

Sep 22, 2021
Aili Shen, Xudong Han, Trevor Cohn, Timothy Baldwin, Lea Frermann

Figure 1 for Contrastive Learning for Fair Representations
Figure 2 for Contrastive Learning for Fair Representations
Figure 3 for Contrastive Learning for Fair Representations
Figure 4 for Contrastive Learning for Fair Representations

Trained classification models can unintentionally lead to biased representations and predictions, which can reinforce societal preconceptions and stereotypes. Existing debiasing methods for classification models, such as adversarial training, are often expensive to train and difficult to optimise. In this paper, we propose a method for mitigating bias in classifier training by incorporating contrastive learning, in which instances sharing the same class label are encouraged to have similar representations, while instances sharing a protected attribute are forced further apart. In such a way our method learns representations which capture the task label in focused regions, while ensuring the protected attribute has diverse spread, and thus has limited impact on prediction and thereby results in fairer models. Extensive experimental results across four tasks in NLP and computer vision show (a) that our proposed method can achieve fairer representations and realises bias reductions compared with competitive baselines; and (b) that it can do so without sacrificing main task performance; (c) that it sets a new state-of-the-art performance in one task despite reducing the bias. Finally, our method is conceptually simple and agnostic to network architectures, and incurs minimal additional compute cost.

Viaarxiv icon