Picture for Eric Wong

Eric Wong

Data-Efficient Learning with Neural Programs

Add code
Jun 10, 2024
Viaarxiv icon

DISCRET: Synthesizing Faithful Explanations For Treatment Effect Estimation

Add code
Jun 02, 2024
Viaarxiv icon

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Add code
Mar 28, 2024
Viaarxiv icon

Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing

Add code
Feb 28, 2024
Figure 1 for Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
Figure 2 for Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
Figure 3 for Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
Figure 4 for Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
Viaarxiv icon

Initialization Matters for Adversarial Transfer Learning

Add code
Dec 10, 2023
Viaarxiv icon

Sum-of-Parts Models: Faithful Attributions for Groups of Features

Add code
Oct 25, 2023
Viaarxiv icon

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

Add code
Oct 19, 2023
Figure 1 for SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation
Figure 2 for SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation
Figure 3 for SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation
Figure 4 for SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation
Viaarxiv icon

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

Add code
Oct 13, 2023
Figure 1 for SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Figure 2 for SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Figure 3 for SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Figure 4 for SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Viaarxiv icon

Jailbreaking Black Box Large Language Models in Twenty Queries

Add code
Oct 13, 2023
Figure 1 for Jailbreaking Black Box Large Language Models in Twenty Queries
Figure 2 for Jailbreaking Black Box Large Language Models in Twenty Queries
Figure 3 for Jailbreaking Black Box Large Language Models in Twenty Queries
Figure 4 for Jailbreaking Black Box Large Language Models in Twenty Queries
Viaarxiv icon

Comparing Styles across Languages

Oct 11, 2023
Viaarxiv icon