Alert button

Understanding Stereotypes in Language Models: Towards Robust Measurement and Zero-Shot Debiasing

Dec 20, 2022
Justus Mattern, Zhijing Jin, Mrinmaya Sachan, Rada Mihalcea, Bernhard Schölkopf

Figure 1 for Understanding Stereotypes in Language Models: Towards Robust Measurement and Zero-Shot Debiasing
Figure 2 for Understanding Stereotypes in Language Models: Towards Robust Measurement and Zero-Shot Debiasing
Figure 3 for Understanding Stereotypes in Language Models: Towards Robust Measurement and Zero-Shot Debiasing
Figure 4 for Understanding Stereotypes in Language Models: Towards Robust Measurement and Zero-Shot Debiasing

Share this with someone who'll enjoy it:

Generated texts from large pretrained language models have been shown to exhibit a variety of harmful, human-like biases about various demographics. These findings prompted large efforts aiming to understand and measure such effects, with the goal of providing benchmarks that can guide the development of techniques mitigating these stereotypical associations. However, as recent research has pointed out, the current benchmarks lack a robust experimental setup, consequently hindering the inference of meaningful conclusions from their evaluation metrics. In this paper, we extend these arguments and demonstrate that existing techniques and benchmarks aiming to measure stereotypes tend to be inaccurate and consist of a high degree of experimental noise that severely limits the knowledge we can gain from benchmarking language models based on them. Accordingly, we propose a new framework for robustly measuring and quantifying biases exhibited by generative language models. Finally, we use this framework to investigate GPT-3's occupational gender bias and propose prompting techniques for mitigating these biases without the need for fine-tuning.

View paper onarxiv icon

Share this with someone who'll enjoy it: