Picture for Florian Tramèr

Florian Tramèr

Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition

Add code
Jun 12, 2024
Viaarxiv icon

Evaluations of Machine Learning Privacy Defenses are Misleading

Add code
Apr 26, 2024
Viaarxiv icon

Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs

Add code
Apr 22, 2024
Viaarxiv icon

Privacy Backdoors: Stealing Data with Corrupted Pretrained Models

Add code
Mar 30, 2024
Viaarxiv icon

Query-Based Adversarial Prompt Generation

Feb 19, 2024
Viaarxiv icon

Scalable Extraction of Training Data from (Production) Language Models

Nov 28, 2023
Viaarxiv icon

Universal Jailbreak Backdoors from Poisoned Human Feedback

Add code
Nov 24, 2023
Viaarxiv icon

Privacy Side Channels in Machine Learning Systems

Add code
Sep 11, 2023
Viaarxiv icon

Evaluating Superhuman Models with Consistency Checks

Add code
Jun 19, 2023
Figure 1 for Evaluating Superhuman Models with Consistency Checks
Figure 2 for Evaluating Superhuman Models with Consistency Checks
Figure 3 for Evaluating Superhuman Models with Consistency Checks
Figure 4 for Evaluating Superhuman Models with Consistency Checks
Viaarxiv icon

Evading Black-box Classifiers Without Breaking Eggs

Add code
Jun 05, 2023
Figure 1 for Evading Black-box Classifiers Without Breaking Eggs
Figure 2 for Evading Black-box Classifiers Without Breaking Eggs
Figure 3 for Evading Black-box Classifiers Without Breaking Eggs
Figure 4 for Evading Black-box Classifiers Without Breaking Eggs
Viaarxiv icon