Picture for Paul Röttger

Paul Röttger

University of Oxford

Around the World in 24 Hours: Probing LLM Knowledge of Time and Place

Add code
Jun 04, 2025
Viaarxiv icon

TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent

Add code
May 26, 2025
Viaarxiv icon

Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals' Subjective Text Perceptions

Add code
Feb 28, 2025
Viaarxiv icon

IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance

Add code
Feb 12, 2025
Viaarxiv icon

Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations

Add code
Feb 10, 2025
Viaarxiv icon

MSTS: A Multimodal Safety Test Suite for Vision-Language Models

Add code
Jan 17, 2025
Viaarxiv icon

AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages

Add code
Jan 15, 2025
Figure 1 for AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages
Figure 2 for AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages
Figure 3 for AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages
Figure 4 for AfriHate: A Multilingual Collection of Hate Speech and Abusive Language Datasets for African Languages
Viaarxiv icon

HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter

Add code
Nov 23, 2024
Figure 1 for HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter
Figure 2 for HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter
Figure 3 for HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter
Figure 4 for HateDay: Insights from a Global Hate Speech Dataset Representative of a Day on Twitter
Viaarxiv icon

Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation

Add code
Oct 04, 2024
Viaarxiv icon

Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models

Add code
Aug 08, 2024
Figure 1 for Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models
Figure 2 for Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models
Figure 3 for Compromesso! Italian Many-Shot Jailbreaks Undermine the Safety of Large Language Models
Viaarxiv icon