Picture for Zhen Xiang

Zhen Xiang

AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases

Add code
Jul 17, 2024
Figure 1 for AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases
Figure 2 for AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases
Figure 3 for AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases
Figure 4 for AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases
Viaarxiv icon

GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning

Add code
Jun 13, 2024
Viaarxiv icon

ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs

Add code
Feb 22, 2024
Figure 1 for ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
Figure 2 for ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
Figure 3 for ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
Figure 4 for ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
Viaarxiv icon

BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models

Add code
Jan 20, 2024
Viaarxiv icon

CBD: A Certified Backdoor Detector Based on Local Dominant Probability

Add code
Oct 26, 2023
Viaarxiv icon

Backdoor Mitigation by Correcting the Distribution of Neural Activations

Add code
Aug 18, 2023
Figure 1 for Backdoor Mitigation by Correcting the Distribution of Neural Activations
Figure 2 for Backdoor Mitigation by Correcting the Distribution of Neural Activations
Figure 3 for Backdoor Mitigation by Correcting the Distribution of Neural Activations
Figure 4 for Backdoor Mitigation by Correcting the Distribution of Neural Activations
Viaarxiv icon

Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection

Add code
Aug 08, 2023
Figure 1 for Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection
Figure 2 for Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection
Figure 3 for Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection
Figure 4 for Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection
Viaarxiv icon

UMD: Unsupervised Model Detection for X2X Backdoor Attacks

Add code
Jun 02, 2023
Figure 1 for UMD: Unsupervised Model Detection for X2X Backdoor Attacks
Figure 2 for UMD: Unsupervised Model Detection for X2X Backdoor Attacks
Figure 3 for UMD: Unsupervised Model Detection for X2X Backdoor Attacks
Figure 4 for UMD: Unsupervised Model Detection for X2X Backdoor Attacks
Viaarxiv icon

Universal Post-Training Backdoor Detection

Add code
May 13, 2022
Figure 1 for Universal Post-Training Backdoor Detection
Figure 2 for Universal Post-Training Backdoor Detection
Figure 3 for Universal Post-Training Backdoor Detection
Figure 4 for Universal Post-Training Backdoor Detection
Viaarxiv icon

Post-Training Detection of Backdoor Attacks for Two-Class and Multi-Attack Scenarios

Add code
Jan 20, 2022
Figure 1 for Post-Training Detection of Backdoor Attacks for Two-Class and Multi-Attack Scenarios
Figure 2 for Post-Training Detection of Backdoor Attacks for Two-Class and Multi-Attack Scenarios
Figure 3 for Post-Training Detection of Backdoor Attacks for Two-Class and Multi-Attack Scenarios
Figure 4 for Post-Training Detection of Backdoor Attacks for Two-Class and Multi-Attack Scenarios
Viaarxiv icon