Alert button
Picture for Xiangyu Qi

Xiangyu Qi

Alert button

Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment

Add code
Bookmark button
Alert button
Feb 27, 2024
Jiongxiao Wang, Jiazhao Li, Yiquan Li, Xiangyu Qi, Junjie Hu, Yixuan Li, Patrick McDaniel, Muhao Chen, Bo Li, Chaowei Xiao

Viaarxiv icon

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Add code
Bookmark button
Alert button
Feb 07, 2024
Boyi Wei, Kaixuan Huang, Yangsibo Huang, Tinghao Xie, Xiangyu Qi, Mengzhou Xia, Prateek Mittal, Mengdi Wang, Peter Henderson

Viaarxiv icon

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

Add code
Bookmark button
Alert button
Oct 05, 2023
Xiangyu Qi, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, Peter Henderson

Figure 1 for Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Figure 2 for Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Figure 3 for Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Figure 4 for Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Viaarxiv icon

BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection

Add code
Bookmark button
Alert button
Aug 23, 2023
Tinghao Xie, Xiangyu Qi, Ping He, Yiming Li, Jiachen T. Wang, Prateek Mittal

Figure 1 for BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection
Figure 2 for BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection
Figure 3 for BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection
Figure 4 for BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection
Viaarxiv icon

Visual Adversarial Examples Jailbreak Large Language Models

Add code
Bookmark button
Alert button
Jun 22, 2023
Xiangyu Qi, Kaixuan Huang, Ashwinee Panda, Mengdi Wang, Prateek Mittal

Figure 1 for Visual Adversarial Examples Jailbreak Large Language Models
Figure 2 for Visual Adversarial Examples Jailbreak Large Language Models
Figure 3 for Visual Adversarial Examples Jailbreak Large Language Models
Figure 4 for Visual Adversarial Examples Jailbreak Large Language Models
Viaarxiv icon

Uncovering Adversarial Risks of Test-Time Adaptation

Add code
Bookmark button
Alert button
Feb 04, 2023
Tong Wu, Feiran Jia, Xiangyu Qi, Jiachen T. Wang, Vikash Sehwag, Saeed Mahloujifar, Prateek Mittal

Figure 1 for Uncovering Adversarial Risks of Test-Time Adaptation
Figure 2 for Uncovering Adversarial Risks of Test-Time Adaptation
Figure 3 for Uncovering Adversarial Risks of Test-Time Adaptation
Figure 4 for Uncovering Adversarial Risks of Test-Time Adaptation
Viaarxiv icon

Fight Poison with Poison: Detecting Backdoor Poison Samples via Decoupling Benign Correlations

Add code
Bookmark button
Alert button
May 26, 2022
Xiangyu Qi, Tinghao Xie, Saeed Mahloujifar, Prateek Mittal

Figure 1 for Fight Poison with Poison: Detecting Backdoor Poison Samples via Decoupling Benign Correlations
Figure 2 for Fight Poison with Poison: Detecting Backdoor Poison Samples via Decoupling Benign Correlations
Figure 3 for Fight Poison with Poison: Detecting Backdoor Poison Samples via Decoupling Benign Correlations
Figure 4 for Fight Poison with Poison: Detecting Backdoor Poison Samples via Decoupling Benign Correlations
Viaarxiv icon

Circumventing Backdoor Defenses That Are Based on Latent Separability

Add code
Bookmark button
Alert button
May 26, 2022
Xiangyu Qi, Tinghao Xie, Saeed Mahloujifar, Prateek Mittal

Figure 1 for Circumventing Backdoor Defenses That Are Based on Latent Separability
Figure 2 for Circumventing Backdoor Defenses That Are Based on Latent Separability
Figure 3 for Circumventing Backdoor Defenses That Are Based on Latent Separability
Figure 4 for Circumventing Backdoor Defenses That Are Based on Latent Separability
Viaarxiv icon