Stereoset


Detecting Stereotypes and Anti-stereotypes the Correct Way Using Social Psychological Underpinnings

Add code
Apr 04, 2025
Viaarxiv icon

Rethinking Prompt-based Debiasing in Large Language Models

Add code
Mar 12, 2025
Viaarxiv icon

BiasEdit: Debiasing Stereotyped Language Models via Model Editing

Add code
Mar 11, 2025
Viaarxiv icon

LLMs are Vulnerable to Malicious Prompts Disguised as Scientific Language

Add code
Jan 23, 2025
Viaarxiv icon

Mitigating Social Bias in Large Language Models: A Multi-Objective Approach within a Multi-Agent Framework

Add code
Dec 20, 2024
Figure 1 for Mitigating Social Bias in Large Language Models: A Multi-Objective Approach within a Multi-Agent Framework
Figure 2 for Mitigating Social Bias in Large Language Models: A Multi-Objective Approach within a Multi-Agent Framework
Figure 3 for Mitigating Social Bias in Large Language Models: A Multi-Objective Approach within a Multi-Agent Framework
Figure 4 for Mitigating Social Bias in Large Language Models: A Multi-Objective Approach within a Multi-Agent Framework
Viaarxiv icon

STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive Progressions

Add code
Sep 20, 2024
Viaarxiv icon

BanStereoSet: A Dataset to Measure Stereotypical Social Biases in LLMs for Bangla

Add code
Sep 18, 2024
Viaarxiv icon

Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models

Add code
Aug 14, 2024
Viaarxiv icon

The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models

Add code
Jun 14, 2024
Figure 1 for The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models
Figure 2 for The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models
Figure 3 for The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models
Figure 4 for The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models
Viaarxiv icon

Investigating Bias Representations in Llama 2 Chat via Activation Steering

Add code
Feb 01, 2024
Viaarxiv icon