Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ke-Li Chiu

Detecting Hate Speech with GPT-3

Mar 23, 2021

Ke-Li Chiu, Rohan Alexander

Figure 1 for Detecting Hate Speech with GPT-3

Figure 2 for Detecting Hate Speech with GPT-3

Figure 3 for Detecting Hate Speech with GPT-3

Figure 4 for Detecting Hate Speech with GPT-3

Abstract:Sophisticated language models such as OpenAI's GPT-3 can generate hateful text that targets marginalized groups. Given this capacity, we are interested in whether large language models can be used to identify hate speech and classify text as sexist or racist? We use GPT-3 to identify sexist and racist text passages with zero-, one-, and few-shot learning. We find that with zero- and one-shot learning, GPT-3 is able to identify sexist or racist text with an accuracy between 48 per cent and 69 per cent. With few-shot learning and an instruction included in the prompt, the model's accuracy can be as high as 78 per cent. We conclude that large language models have a role to play in hate speech detection, and that with further development language models could be used to counter hate speech and even self-police.

* 15 pages, 1 figure, 8 tables

Via

Access Paper or Ask Questions

On consistency scores in text data with an implementation in R

Jan 13, 2021

Ke-Li Chiu, Rohan Alexander

Abstract:In this paper, we introduce a reproducible cleaning process for the text extracted from PDFs using n-gram models. Our approach compares the originally extracted text with the text generated from, or expected by, these models using earlier text as stimulus. To guide this process, we introduce the notion of a consistency score, which refers to the proportion of text that is expected by the model. This is used to monitor changes during the cleaning process, and across different corpuses. We illustrate our process on text from the book Jane Eyre and introduce both a Shiny application and an R package to make our process easier for others to adopt.

* 13 pages, 0 figures

Via

Access Paper or Ask Questions