Picture for Paul Röttger

Paul Röttger

University of Oxford

Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation

Add code
Sep 10, 2025
Viaarxiv icon

No for Some, Yes for Others: Persona Prompts and Other Sources of False Refusal in Language Models

Add code
Sep 09, 2025
Viaarxiv icon

Principled Personas: Defining and Measuring the Intended Effects of Persona Prompting on Task Performance

Add code
Aug 27, 2025
Viaarxiv icon

The Pluralistic Moral Gap: Understanding Judgment and Value Differences between Humans and Large Language Models

Add code
Jul 23, 2025
Viaarxiv icon

Around the World in 24 Hours: Probing LLM Knowledge of Time and Place

Add code
Jun 04, 2025
Viaarxiv icon

TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent

Add code
May 26, 2025
Viaarxiv icon

Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals' Subjective Text Perceptions

Add code
Feb 28, 2025
Viaarxiv icon

IssueBench: Millions of Realistic Prompts for Measuring Issue Bias in LLM Writing Assistance

Add code
Feb 12, 2025
Viaarxiv icon

Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations

Add code
Feb 10, 2025
Viaarxiv icon

MSTS: A Multimodal Safety Test Suite for Vision-Language Models

Add code
Jan 17, 2025
Viaarxiv icon