Alert button
Picture for Alexander Robey

Alexander Robey

Alert button

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Add code
Bookmark button
Alert button
Mar 28, 2024
Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Hamed Hassani, Eric Wong

Viaarxiv icon

Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation

Add code
Bookmark button
Alert button
Mar 28, 2024
Yutong He, Alexander Robey, Naoki Murata, Yiding Jiang, Joshua Williams, George J. Pappas, Hamed Hassani, Yuki Mitsufuji, Ruslan Salakhutdinov, J. Zico Kolter

Viaarxiv icon

A Safe Harbor for AI Evaluation and Red Teaming

Add code
Bookmark button
Alert button
Mar 07, 2024
Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Sandy Pentland, Arvind Narayanan, Percy Liang, Peter Henderson

Figure 1 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 2 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 3 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 4 for A Safe Harbor for AI Evaluation and Red Teaming
Viaarxiv icon

Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing

Add code
Bookmark button
Alert button
Feb 28, 2024
Jiabao Ji, Bairu Hou, Alexander Robey, George J. Pappas, Hamed Hassani, Yang Zhang, Eric Wong, Shiyu Chang

Viaarxiv icon

Data-Driven Modeling and Verification of Perception-Based Autonomous Systems

Add code
Bookmark button
Alert button
Dec 11, 2023
Thomas Waite, Alexander Robey, Hassani Hamed, George J. Pappas, Radoslav Ivanov

Viaarxiv icon

Jailbreaking Black Box Large Language Models in Twenty Queries

Add code
Bookmark button
Alert button
Oct 13, 2023
Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, Eric Wong

Figure 1 for Jailbreaking Black Box Large Language Models in Twenty Queries
Figure 2 for Jailbreaking Black Box Large Language Models in Twenty Queries
Figure 3 for Jailbreaking Black Box Large Language Models in Twenty Queries
Figure 4 for Jailbreaking Black Box Large Language Models in Twenty Queries
Viaarxiv icon

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

Add code
Bookmark button
Alert button
Oct 13, 2023
Alexander Robey, Eric Wong, Hamed Hassani, George J. Pappas

Figure 1 for SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Figure 2 for SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Figure 3 for SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Figure 4 for SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Viaarxiv icon

Adversarial Training Should Be Cast as a Non-Zero-Sum Game

Add code
Bookmark button
Alert button
Jun 19, 2023
Alexander Robey, Fabian Latorre, George J. Pappas, Hamed Hassani, Volkan Cevher

Figure 1 for Adversarial Training Should Be Cast as a Non-Zero-Sum Game
Figure 2 for Adversarial Training Should Be Cast as a Non-Zero-Sum Game
Figure 3 for Adversarial Training Should Be Cast as a Non-Zero-Sum Game
Viaarxiv icon