Alert button
Picture for Bertie Vidgen

Bertie Vidgen

Alert button

SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

Add code
Bookmark button
Alert button
Apr 08, 2024
Paul Röttger, Fabio Pernisi, Bertie Vidgen, Dirk Hovy

Viaarxiv icon

FinanceBench: A New Benchmark for Financial Question Answering

Add code
Bookmark button
Alert button
Nov 20, 2023
Pranab Islam, Anand Kannappan, Douwe Kiela, Rebecca Qian, Nino Scherrer, Bertie Vidgen

Viaarxiv icon

SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models

Add code
Bookmark button
Alert button
Nov 14, 2023
Bertie Vidgen, Hannah Rose Kirk, Rebecca Qian, Nino Scherrer, Anand Kannappan, Scott A. Hale, Paul Röttger

Viaarxiv icon

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values

Add code
Bookmark button
Alert button
Oct 11, 2023
Hannah Rose Kirk, Andrew M. Bean, Bertie Vidgen, Paul Röttger, Scott A. Hale

Viaarxiv icon

The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models

Add code
Bookmark button
Alert button
Oct 03, 2023
Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A. Hale

Viaarxiv icon

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models

Add code
Bookmark button
Alert button
Aug 02, 2023
Paul Röttger, Hannah Rose Kirk, Bertie Vidgen, Giuseppe Attanasio, Federico Bianchi, Dirk Hovy

Figure 1 for XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
Figure 2 for XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
Figure 3 for XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
Viaarxiv icon

Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback

Add code
Bookmark button
Alert button
Mar 09, 2023
Hannah Rose Kirk, Bertie Vidgen, Paul Röttger, Scott A. Hale

Figure 1 for Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
Viaarxiv icon

SemEval-2023 Task 10: Explainable Detection of Online Sexism

Add code
Bookmark button
Alert button
Mar 07, 2023
Hannah Rose Kirk, Wenjie Yin, Bertie Vidgen, Paul Röttger

Figure 1 for SemEval-2023 Task 10: Explainable Detection of Online Sexism
Figure 2 for SemEval-2023 Task 10: Explainable Detection of Online Sexism
Figure 3 for SemEval-2023 Task 10: Explainable Detection of Online Sexism
Figure 4 for SemEval-2023 Task 10: Explainable Detection of Online Sexism
Viaarxiv icon

Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning

Add code
Bookmark button
Alert button
Sep 21, 2022
Hannah Rose Kirk, Bertie Vidgen, Scott A. Hale

Figure 1 for Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning
Figure 2 for Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning
Figure 3 for Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning
Figure 4 for Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning
Viaarxiv icon

Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models

Add code
Bookmark button
Alert button
Jun 20, 2022
Paul Röttger, Haitham Seelawi, Debora Nozza, Zeerak Talat, Bertie Vidgen

Figure 1 for Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models
Figure 2 for Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models
Figure 3 for Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models
Figure 4 for Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models
Viaarxiv icon