Alert button
Picture for Eric Wong

Eric Wong

Alert button

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Add code
Bookmark button
Alert button
Mar 28, 2024
Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Hamed Hassani, Eric Wong

Viaarxiv icon

Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing

Add code
Bookmark button
Alert button
Feb 28, 2024
Jiabao Ji, Bairu Hou, Alexander Robey, George J. Pappas, Hamed Hassani, Yang Zhang, Eric Wong, Shiyu Chang

Viaarxiv icon

Initialization Matters for Adversarial Transfer Learning

Add code
Bookmark button
Alert button
Dec 10, 2023
Andong Hua, Jindong Gu, Zhiyu Xue, Nicholas Carlini, Eric Wong, Yao Qin

Viaarxiv icon

Sum-of-Parts Models: Faithful Attributions for Groups of Features

Add code
Bookmark button
Alert button
Oct 25, 2023
Weiqiu You, Helen Qu, Marco Gatti, Bhuvnesh Jain, Eric Wong

Viaarxiv icon

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

Add code
Bookmark button
Alert button
Oct 19, 2023
Chongyu Fan, Jiancheng Liu, Yihua Zhang, Dennis Wei, Eric Wong, Sijia Liu

Viaarxiv icon

Jailbreaking Black Box Large Language Models in Twenty Queries

Add code
Bookmark button
Alert button
Oct 13, 2023
Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, Eric Wong

Figure 1 for Jailbreaking Black Box Large Language Models in Twenty Queries
Figure 2 for Jailbreaking Black Box Large Language Models in Twenty Queries
Figure 3 for Jailbreaking Black Box Large Language Models in Twenty Queries
Figure 4 for Jailbreaking Black Box Large Language Models in Twenty Queries
Viaarxiv icon

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

Add code
Bookmark button
Alert button
Oct 13, 2023
Alexander Robey, Eric Wong, Hamed Hassani, George J. Pappas

Figure 1 for SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Figure 2 for SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Figure 3 for SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Figure 4 for SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Viaarxiv icon

Comparing Styles across Languages

Add code
Bookmark button
Alert button
Oct 11, 2023
Shreya Havaldar, Matthew Pressimone, Eric Wong, Lyle Ungar

Viaarxiv icon