Picture for Matt Fredrikson

Matt Fredrikson

Sales Whisperer: A Human-Inconspicuous Attack on LLM Brand Recommendations

Add code
Jun 07, 2024
Figure 1 for Sales Whisperer: A Human-Inconspicuous Attack on LLM Brand Recommendations
Figure 2 for Sales Whisperer: A Human-Inconspicuous Attack on LLM Brand Recommendations
Figure 3 for Sales Whisperer: A Human-Inconspicuous Attack on LLM Brand Recommendations
Figure 4 for Sales Whisperer: A Human-Inconspicuous Attack on LLM Brand Recommendations
Viaarxiv icon

Improving Alignment and Robustness with Short Circuiting

Add code
Jun 06, 2024
Figure 1 for Improving Alignment and Robustness with Short Circuiting
Figure 2 for Improving Alignment and Robustness with Short Circuiting
Figure 3 for Improving Alignment and Robustness with Short Circuiting
Figure 4 for Improving Alignment and Robustness with Short Circuiting
Viaarxiv icon

VeriSplit: Secure and Practical Offloading of Machine Learning Inferences across IoT Devices

Add code
Jun 02, 2024
Figure 1 for VeriSplit: Secure and Practical Offloading of Machine Learning Inferences across IoT Devices
Figure 2 for VeriSplit: Secure and Practical Offloading of Machine Learning Inferences across IoT Devices
Figure 3 for VeriSplit: Secure and Practical Offloading of Machine Learning Inferences across IoT Devices
Figure 4 for VeriSplit: Secure and Practical Offloading of Machine Learning Inferences across IoT Devices
Viaarxiv icon

Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization

Add code
May 15, 2024
Figure 1 for Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization
Figure 2 for Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization
Figure 3 for Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization
Figure 4 for Efficient LLM Jailbreak via Adaptive Dense-to-sparse Constrained Optimization
Viaarxiv icon

Transfer Attacks and Defenses for Large Language Models on Coding Tasks

Add code
Nov 22, 2023
Figure 1 for Transfer Attacks and Defenses for Large Language Models on Coding Tasks
Figure 2 for Transfer Attacks and Defenses for Large Language Models on Coding Tasks
Figure 3 for Transfer Attacks and Defenses for Large Language Models on Coding Tasks
Figure 4 for Transfer Attacks and Defenses for Large Language Models on Coding Tasks
Viaarxiv icon

Is Certifying $\ell_p$ Robustness Still Worthwhile?

Add code
Oct 13, 2023
Figure 1 for Is Certifying $\ell_p$ Robustness Still Worthwhile?
Figure 2 for Is Certifying $\ell_p$ Robustness Still Worthwhile?
Figure 3 for Is Certifying $\ell_p$ Robustness Still Worthwhile?
Figure 4 for Is Certifying $\ell_p$ Robustness Still Worthwhile?
Viaarxiv icon

Representation Engineering: A Top-Down Approach to AI Transparency

Add code
Oct 10, 2023
Figure 1 for Representation Engineering: A Top-Down Approach to AI Transparency
Figure 2 for Representation Engineering: A Top-Down Approach to AI Transparency
Figure 3 for Representation Engineering: A Top-Down Approach to AI Transparency
Figure 4 for Representation Engineering: A Top-Down Approach to AI Transparency
Viaarxiv icon

A Recipe for Improved Certifiable Robustness: Capacity and Data

Add code
Oct 04, 2023
Viaarxiv icon

Universal and Transferable Adversarial Attacks on Aligned Language Models

Add code
Jul 27, 2023
Figure 1 for Universal and Transferable Adversarial Attacks on Aligned Language Models
Figure 2 for Universal and Transferable Adversarial Attacks on Aligned Language Models
Figure 3 for Universal and Transferable Adversarial Attacks on Aligned Language Models
Figure 4 for Universal and Transferable Adversarial Attacks on Aligned Language Models
Viaarxiv icon

Scaling in Depth: Unlocking Robustness Certification on ImageNet

Add code
Jan 29, 2023
Figure 1 for Scaling in Depth: Unlocking Robustness Certification on ImageNet
Figure 2 for Scaling in Depth: Unlocking Robustness Certification on ImageNet
Figure 3 for Scaling in Depth: Unlocking Robustness Certification on ImageNet
Figure 4 for Scaling in Depth: Unlocking Robustness Certification on ImageNet
Viaarxiv icon