Picture for Katarzyna Kapusta

Katarzyna Kapusta

Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models

Add code
Mar 08, 2025
Figure 1 for Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Figure 2 for Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Figure 3 for Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Figure 4 for Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Viaarxiv icon

DiffGuard: Text-Based Safety Checker for Diffusion Models

Add code
Nov 25, 2024
Viaarxiv icon

When Federated Learning meets Watermarking: A Comprehensive Overview of Techniques for Intellectual Property Protection

Add code
Aug 07, 2023
Viaarxiv icon