Picture for Alexander Robey

Alexander Robey

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Add code
Mar 28, 2024
Figure 1 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 2 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 3 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Figure 4 for JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
Viaarxiv icon

A Safe Harbor for AI Evaluation and Red Teaming

Add code
Mar 07, 2024
Figure 1 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 2 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 3 for A Safe Harbor for AI Evaluation and Red Teaming
Figure 4 for A Safe Harbor for AI Evaluation and Red Teaming
Viaarxiv icon

Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing

Add code
Feb 28, 2024
Figure 1 for Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
Figure 2 for Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
Figure 3 for Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
Figure 4 for Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
Viaarxiv icon

Data-Driven Modeling and Verification of Perception-Based Autonomous Systems

Add code
Dec 11, 2023
Viaarxiv icon

SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

Add code
Oct 13, 2023
Figure 1 for SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Figure 2 for SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Figure 3 for SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Figure 4 for SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Viaarxiv icon

Jailbreaking Black Box Large Language Models in Twenty Queries

Add code
Oct 13, 2023
Figure 1 for Jailbreaking Black Box Large Language Models in Twenty Queries
Figure 2 for Jailbreaking Black Box Large Language Models in Twenty Queries
Figure 3 for Jailbreaking Black Box Large Language Models in Twenty Queries
Figure 4 for Jailbreaking Black Box Large Language Models in Twenty Queries
Viaarxiv icon

Adversarial Training Should Be Cast as a Non-Zero-Sum Game

Add code
Jun 19, 2023
Viaarxiv icon

Probable Domain Generalization via Quantile Risk Minimization

Add code
Jul 20, 2022
Figure 1 for Probable Domain Generalization via Quantile Risk Minimization
Figure 2 for Probable Domain Generalization via Quantile Risk Minimization
Figure 3 for Probable Domain Generalization via Quantile Risk Minimization
Figure 4 for Probable Domain Generalization via Quantile Risk Minimization
Viaarxiv icon

Toward Certified Robustness Against Real-World Distribution Shifts

Add code
Jun 09, 2022
Figure 1 for Toward Certified Robustness Against Real-World Distribution Shifts
Figure 2 for Toward Certified Robustness Against Real-World Distribution Shifts
Figure 3 for Toward Certified Robustness Against Real-World Distribution Shifts
Figure 4 for Toward Certified Robustness Against Real-World Distribution Shifts
Viaarxiv icon

Chordal Sparsity for Lipschitz Constant Estimation of Deep Neural Networks

Add code
Apr 02, 2022
Figure 1 for Chordal Sparsity for Lipschitz Constant Estimation of Deep Neural Networks
Figure 2 for Chordal Sparsity for Lipschitz Constant Estimation of Deep Neural Networks
Figure 3 for Chordal Sparsity for Lipschitz Constant Estimation of Deep Neural Networks
Figure 4 for Chordal Sparsity for Lipschitz Constant Estimation of Deep Neural Networks
Viaarxiv icon