Alert button
Picture for Florian Tramer

Florian Tramer

Alert button

Dj

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Bookmark button
Alert button
Apr 15, 2024
Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Yoshua Bengio, Danqi Chen, Samuel Albanie, Tegan Maharaj, Jakob Foerster, Florian Tramer, He He, Atoosa Kasirzadeh, Yejin Choi, David Krueger

Viaarxiv icon

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Add code
Bookmark button
Alert button
Mar 28, 2024
Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Hamed Hassani, Eric Wong

Viaarxiv icon

Are aligned neural networks adversarially aligned?

Add code
Bookmark button
Alert button
Jun 26, 2023
Nicholas Carlini, Milad Nasr, Christopher A. Choquette-Choo, Matthew Jagielski, Irena Gao, Anas Awadalla, Pang Wei Koh, Daphne Ippolito, Katherine Lee, Florian Tramer, Ludwig Schmidt

Figure 1 for Are aligned neural networks adversarially aligned?
Figure 2 for Are aligned neural networks adversarially aligned?
Figure 3 for Are aligned neural networks adversarially aligned?
Figure 4 for Are aligned neural networks adversarially aligned?
Viaarxiv icon

Increasing Confidence in Adversarial Robustness Evaluations

Add code
Bookmark button
Alert button
Jun 28, 2022
Roland S. Zimmermann, Wieland Brendel, Florian Tramer, Nicholas Carlini

Figure 1 for Increasing Confidence in Adversarial Robustness Evaluations
Figure 2 for Increasing Confidence in Adversarial Robustness Evaluations
Figure 3 for Increasing Confidence in Adversarial Robustness Evaluations
Figure 4 for Increasing Confidence in Adversarial Robustness Evaluations
Viaarxiv icon

The Privacy Onion Effect: Memorization is Relative

Add code
Bookmark button
Alert button
Jun 22, 2022
Nicholas Carlini, Matthew Jagielski, Chiyuan Zhang, Nicolas Papernot, Andreas Terzis, Florian Tramer

Figure 1 for The Privacy Onion Effect: Memorization is Relative
Figure 2 for The Privacy Onion Effect: Memorization is Relative
Figure 3 for The Privacy Onion Effect: Memorization is Relative
Figure 4 for The Privacy Onion Effect: Memorization is Relative
Viaarxiv icon

(Certified!!) Adversarial Robustness for Free!

Add code
Bookmark button
Alert button
Jun 21, 2022
Nicholas Carlini, Florian Tramer, Krishnamurthy, Dvijotham, J. Zico Kolter

Figure 1 for (Certified!!) Adversarial Robustness for Free!
Figure 2 for (Certified!!) Adversarial Robustness for Free!
Figure 3 for (Certified!!) Adversarial Robustness for Free!
Figure 4 for (Certified!!) Adversarial Robustness for Free!
Viaarxiv icon

Debugging Differential Privacy: A Case Study for Privacy Auditing

Add code
Bookmark button
Alert button
Mar 28, 2022
Florian Tramer, Andreas Terzis, Thomas Steinke, Shuang Song, Matthew Jagielski, Nicholas Carlini

Figure 1 for Debugging Differential Privacy: A Case Study for Privacy Auditing
Viaarxiv icon

Quantifying Memorization Across Neural Language Models

Add code
Bookmark button
Alert button
Feb 24, 2022
Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramer, Chiyuan Zhang

Figure 1 for Quantifying Memorization Across Neural Language Models
Figure 2 for Quantifying Memorization Across Neural Language Models
Figure 3 for Quantifying Memorization Across Neural Language Models
Figure 4 for Quantifying Memorization Across Neural Language Models
Viaarxiv icon