Picture for Florian Tramèr

Florian Tramèr

Apertus: Democratizing Open and Compliant LLMs for Global Language Environments

Add code
Sep 17, 2025
Viaarxiv icon

Design Patterns for Securing LLM Agents against Prompt Injections

Add code
Jun 11, 2025
Figure 1 for Design Patterns for Securing LLM Agents against Prompt Injections
Figure 2 for Design Patterns for Securing LLM Agents against Prompt Injections
Figure 3 for Design Patterns for Securing LLM Agents against Prompt Injections
Figure 4 for Design Patterns for Securing LLM Agents against Prompt Injections
Viaarxiv icon

Membership Inference Attacks on Sequence Models

Add code
Jun 05, 2025
Viaarxiv icon

RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics

Add code
May 18, 2025
Viaarxiv icon

LLMs unlock new paths to monetizing exploits

Add code
May 16, 2025
Viaarxiv icon

The Jailbreak Tax: How Useful are Your Jailbreak Outputs?

Add code
Apr 14, 2025
Figure 1 for The Jailbreak Tax: How Useful are Your Jailbreak Outputs?
Figure 2 for The Jailbreak Tax: How Useful are Your Jailbreak Outputs?
Figure 3 for The Jailbreak Tax: How Useful are Your Jailbreak Outputs?
Figure 4 for The Jailbreak Tax: How Useful are Your Jailbreak Outputs?
Viaarxiv icon

Defeating Prompt Injections by Design

Add code
Mar 24, 2025
Viaarxiv icon

AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses

Add code
Mar 03, 2025
Viaarxiv icon

Adversarial ML Problems Are Getting Harder to Solve and to Evaluate

Add code
Feb 04, 2025
Viaarxiv icon

International AI Safety Report

Add code
Jan 29, 2025
Viaarxiv icon