Picture for Florian Tramer

Florian Tramer

Dj

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Apr 15, 2024
Viaarxiv icon

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Add code
Mar 28, 2024
Figure 1 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 2 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 3 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 4 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Viaarxiv icon

Are aligned neural networks adversarially aligned?

Add code
Jun 26, 2023
Figure 1 for Are aligned neural networks adversarially aligned?
Figure 2 for Are aligned neural networks adversarially aligned?
Figure 3 for Are aligned neural networks adversarially aligned?
Figure 4 for Are aligned neural networks adversarially aligned?
Viaarxiv icon

Increasing Confidence in Adversarial Robustness Evaluations

Add code
Jun 28, 2022
Figure 1 for Increasing Confidence in Adversarial Robustness Evaluations
Figure 2 for Increasing Confidence in Adversarial Robustness Evaluations
Figure 3 for Increasing Confidence in Adversarial Robustness Evaluations
Figure 4 for Increasing Confidence in Adversarial Robustness Evaluations
Viaarxiv icon

The Privacy Onion Effect: Memorization is Relative

Add code
Jun 22, 2022
Figure 1 for The Privacy Onion Effect: Memorization is Relative
Figure 2 for The Privacy Onion Effect: Memorization is Relative
Figure 3 for The Privacy Onion Effect: Memorization is Relative
Figure 4 for The Privacy Onion Effect: Memorization is Relative
Viaarxiv icon

(Certified!!) Adversarial Robustness for Free!

Add code
Jun 21, 2022
Figure 1 for (Certified!!) Adversarial Robustness for Free!
Figure 2 for (Certified!!) Adversarial Robustness for Free!
Figure 3 for (Certified!!) Adversarial Robustness for Free!
Figure 4 for (Certified!!) Adversarial Robustness for Free!
Viaarxiv icon

Debugging Differential Privacy: A Case Study for Privacy Auditing

Add code
Mar 28, 2022
Figure 1 for Debugging Differential Privacy: A Case Study for Privacy Auditing
Viaarxiv icon

Quantifying Memorization Across Neural Language Models

Add code
Feb 24, 2022
Figure 1 for Quantifying Memorization Across Neural Language Models
Figure 2 for Quantifying Memorization Across Neural Language Models
Figure 3 for Quantifying Memorization Across Neural Language Models
Figure 4 for Quantifying Memorization Across Neural Language Models
Viaarxiv icon

Membership Inference Attacks From First Principles

Add code
Dec 07, 2021
Figure 1 for Membership Inference Attacks From First Principles
Figure 2 for Membership Inference Attacks From First Principles
Figure 3 for Membership Inference Attacks From First Principles
Figure 4 for Membership Inference Attacks From First Principles
Viaarxiv icon

Extracting Training Data from Large Language Models

Add code
Dec 14, 2020
Figure 1 for Extracting Training Data from Large Language Models
Figure 2 for Extracting Training Data from Large Language Models
Figure 3 for Extracting Training Data from Large Language Models
Figure 4 for Extracting Training Data from Large Language Models
Viaarxiv icon