Picture for Michael Backes

Michael Backes

Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification

Add code
Jul 30, 2024
Viaarxiv icon

ICLGuard: Controlling In-Context Learning Behavior for Applicability Authorization

Add code
Jul 09, 2024
Figure 1 for ICLGuard: Controlling In-Context Learning Behavior for Applicability Authorization
Figure 2 for ICLGuard: Controlling In-Context Learning Behavior for Applicability Authorization
Figure 3 for ICLGuard: Controlling In-Context Learning Behavior for Applicability Authorization
Figure 4 for ICLGuard: Controlling In-Context Learning Behavior for Applicability Authorization
Viaarxiv icon

SOS! Soft Prompt Attack Against Open-Source Large Language Models

Add code
Jul 03, 2024
Figure 1 for SOS! Soft Prompt Attack Against Open-Source Large Language Models
Figure 2 for SOS! Soft Prompt Attack Against Open-Source Large Language Models
Figure 3 for SOS! Soft Prompt Attack Against Open-Source Large Language Models
Figure 4 for SOS! Soft Prompt Attack Against Open-Source Large Language Models
Viaarxiv icon

Voice Jailbreak Attacks Against GPT-4o

Add code
May 29, 2024
Viaarxiv icon

Link Stealing Attacks Against Inductive Graph Neural Networks

Add code
May 09, 2024
Figure 1 for Link Stealing Attacks Against Inductive Graph Neural Networks
Figure 2 for Link Stealing Attacks Against Inductive Graph Neural Networks
Figure 3 for Link Stealing Attacks Against Inductive Graph Neural Networks
Figure 4 for Link Stealing Attacks Against Inductive Graph Neural Networks
Viaarxiv icon

UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images

Add code
May 06, 2024
Figure 1 for UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images
Figure 2 for UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images
Figure 3 for UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images
Figure 4 for UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images
Viaarxiv icon

Rapid Adoption, Hidden Risks: The Dual Impact of Large Language Model Customization

Add code
Feb 15, 2024
Viaarxiv icon

Comprehensive Assessment of Jailbreak Attacks Against LLMs

Add code
Feb 08, 2024
Figure 1 for Comprehensive Assessment of Jailbreak Attacks Against LLMs
Figure 2 for Comprehensive Assessment of Jailbreak Attacks Against LLMs
Figure 3 for Comprehensive Assessment of Jailbreak Attacks Against LLMs
Figure 4 for Comprehensive Assessment of Jailbreak Attacks Against LLMs
Viaarxiv icon

Conversation Reconstruction Attack Against GPT Models

Add code
Feb 05, 2024
Viaarxiv icon

TrustLLM: Trustworthiness in Large Language Models

Add code
Jan 25, 2024
Figure 1 for TrustLLM: Trustworthiness in Large Language Models
Figure 2 for TrustLLM: Trustworthiness in Large Language Models
Figure 3 for TrustLLM: Trustworthiness in Large Language Models
Figure 4 for TrustLLM: Trustworthiness in Large Language Models
Viaarxiv icon