Picture for Florian Tramèr

Florian Tramèr

Measuring Non-Adversarial Reproduction of Training Data in Large Language Models

Add code
Nov 15, 2024
Viaarxiv icon

Persistent Pre-Training Poisoning of LLMs

Add code
Oct 17, 2024
Viaarxiv icon

Gradient-based Jailbreak Images for Multimodal Fusion Models

Add code
Oct 04, 2024
Figure 1 for Gradient-based Jailbreak Images for Multimodal Fusion Models
Figure 2 for Gradient-based Jailbreak Images for Multimodal Fusion Models
Figure 3 for Gradient-based Jailbreak Images for Multimodal Fusion Models
Figure 4 for Gradient-based Jailbreak Images for Multimodal Fusion Models
Viaarxiv icon

Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data

Add code
Sep 29, 2024
Figure 1 for Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data
Figure 2 for Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data
Figure 3 for Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data
Figure 4 for Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data
Viaarxiv icon

An Adversarial Perspective on Machine Unlearning for AI Safety

Add code
Sep 26, 2024
Figure 1 for An Adversarial Perspective on Machine Unlearning for AI Safety
Figure 2 for An Adversarial Perspective on Machine Unlearning for AI Safety
Figure 3 for An Adversarial Perspective on Machine Unlearning for AI Safety
Figure 4 for An Adversarial Perspective on Machine Unlearning for AI Safety
Viaarxiv icon

Extracting Training Data from Document-Based VQA Models

Add code
Jul 11, 2024
Figure 1 for Extracting Training Data from Document-Based VQA Models
Figure 2 for Extracting Training Data from Document-Based VQA Models
Figure 3 for Extracting Training Data from Document-Based VQA Models
Figure 4 for Extracting Training Data from Document-Based VQA Models
Viaarxiv icon

Adversarial Search Engine Optimization for Large Language Models

Add code
Jun 26, 2024
Figure 1 for Adversarial Search Engine Optimization for Large Language Models
Figure 2 for Adversarial Search Engine Optimization for Large Language Models
Figure 3 for Adversarial Search Engine Optimization for Large Language Models
Figure 4 for Adversarial Search Engine Optimization for Large Language Models
Viaarxiv icon

Blind Baselines Beat Membership Inference Attacks for Foundation Models

Add code
Jun 23, 2024
Viaarxiv icon

AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents

Add code
Jun 19, 2024
Figure 1 for AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents
Figure 2 for AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents
Figure 3 for AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents
Figure 4 for AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents
Viaarxiv icon

Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition

Add code
Jun 12, 2024
Figure 1 for Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Figure 2 for Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Figure 3 for Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Figure 4 for Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
Viaarxiv icon