Picture for Xiangyu Qi

Xiangyu Qi

Safety Alignment Should Be Made More Than Just a Few Tokens Deep

Add code
Jun 10, 2024
Viaarxiv icon

AI Risk Management Should Incorporate Both Safety and Security

Add code
May 29, 2024
Viaarxiv icon

Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment

Add code
Feb 27, 2024
Viaarxiv icon

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Add code
Feb 07, 2024
Viaarxiv icon

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

Add code
Oct 05, 2023
Figure 1 for Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Figure 2 for Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Figure 3 for Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Figure 4 for Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Viaarxiv icon

BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection

Aug 23, 2023
Figure 1 for BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection
Figure 2 for BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection
Figure 3 for BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection
Figure 4 for BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection
Viaarxiv icon

Visual Adversarial Examples Jailbreak Large Language Models

Add code
Jun 22, 2023
Figure 1 for Visual Adversarial Examples Jailbreak Large Language Models
Figure 2 for Visual Adversarial Examples Jailbreak Large Language Models
Figure 3 for Visual Adversarial Examples Jailbreak Large Language Models
Figure 4 for Visual Adversarial Examples Jailbreak Large Language Models
Viaarxiv icon

Uncovering Adversarial Risks of Test-Time Adaptation

Add code
Feb 04, 2023
Figure 1 for Uncovering Adversarial Risks of Test-Time Adaptation
Figure 2 for Uncovering Adversarial Risks of Test-Time Adaptation
Figure 3 for Uncovering Adversarial Risks of Test-Time Adaptation
Figure 4 for Uncovering Adversarial Risks of Test-Time Adaptation
Viaarxiv icon

Fight Poison with Poison: Detecting Backdoor Poison Samples via Decoupling Benign Correlations

Add code
May 26, 2022
Figure 1 for Fight Poison with Poison: Detecting Backdoor Poison Samples via Decoupling Benign Correlations
Figure 2 for Fight Poison with Poison: Detecting Backdoor Poison Samples via Decoupling Benign Correlations
Figure 3 for Fight Poison with Poison: Detecting Backdoor Poison Samples via Decoupling Benign Correlations
Figure 4 for Fight Poison with Poison: Detecting Backdoor Poison Samples via Decoupling Benign Correlations
Viaarxiv icon

Circumventing Backdoor Defenses That Are Based on Latent Separability

Add code
May 26, 2022
Figure 1 for Circumventing Backdoor Defenses That Are Based on Latent Separability
Figure 2 for Circumventing Backdoor Defenses That Are Based on Latent Separability
Figure 3 for Circumventing Backdoor Defenses That Are Based on Latent Separability
Figure 4 for Circumventing Backdoor Defenses That Are Based on Latent Separability
Viaarxiv icon