Alert button
Picture for Jiongxiao Wang

Jiongxiao Wang

Alert button

Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment

Add code
Bookmark button
Alert button
Feb 27, 2024
Jiongxiao Wang, Jiazhao Li, Yiquan Li, Xiangyu Qi, Junjie Hu, Yixuan Li, Patrick McDaniel, Muhao Chen, Bo Li, Chaowei Xiao

Viaarxiv icon

Preference Poisoning Attacks on Reward Model Learning

Add code
Bookmark button
Alert button
Feb 02, 2024
Junlin Wu, Jiongxiao Wang, Chaowei Xiao, Chenguang Wang, Ning Zhang, Yevgeniy Vorobeychik

Viaarxiv icon

Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations

Add code
Bookmark button
Alert button
Nov 16, 2023
Wenjie Mo, Jiashu Xu, Qin Liu, Jiongxiao Wang, Jun Yan, Chaowei Xiao, Muhao Chen

Viaarxiv icon

On the Exploitability of Reinforcement Learning with Human Feedback for Large Language Models

Add code
Bookmark button
Alert button
Nov 16, 2023
Jiongxiao Wang, Junlin Wu, Muhao Chen, Yevgeniy Vorobeychik, Chaowei Xiao

Viaarxiv icon

On the Exploitability of Instruction Tuning

Add code
Bookmark button
Alert button
Jun 28, 2023
Manli Shu, Jiongxiao Wang, Chen Zhu, Jonas Geiping, Chaowei Xiao, Tom Goldstein

Figure 1 for On the Exploitability of Instruction Tuning
Figure 2 for On the Exploitability of Instruction Tuning
Figure 3 for On the Exploitability of Instruction Tuning
Figure 4 for On the Exploitability of Instruction Tuning
Viaarxiv icon

ChatGPT-powered Conversational Drug Editing Using Retrieval and Domain Feedback

Add code
Bookmark button
Alert button
May 29, 2023
Shengchao Liu, Jiongxiao Wang, Yijin Yang, Chengpeng Wang, Ling Liu, Hongyu Guo, Chaowei Xiao

Figure 1 for ChatGPT-powered Conversational Drug Editing Using Retrieval and Domain Feedback
Figure 2 for ChatGPT-powered Conversational Drug Editing Using Retrieval and Domain Feedback
Figure 3 for ChatGPT-powered Conversational Drug Editing Using Retrieval and Domain Feedback
Figure 4 for ChatGPT-powered Conversational Drug Editing Using Retrieval and Domain Feedback
Viaarxiv icon

Adversarial Demonstration Attacks on Large Language Models

Add code
Bookmark button
Alert button
May 24, 2023
Jiongxiao Wang, Zichen Liu, Keun Hee Park, Muhao Chen, Chaowei Xiao

Figure 1 for Adversarial Demonstration Attacks on Large Language Models
Figure 2 for Adversarial Demonstration Attacks on Large Language Models
Figure 3 for Adversarial Demonstration Attacks on Large Language Models
Figure 4 for Adversarial Demonstration Attacks on Large Language Models
Viaarxiv icon

Defending against Adversarial Audio via Diffusion Model

Add code
Bookmark button
Alert button
Mar 02, 2023
Shutong Wu, Jiongxiao Wang, Wei Ping, Weili Nie, Chaowei Xiao

Figure 1 for Defending against Adversarial Audio via Diffusion Model
Figure 2 for Defending against Adversarial Audio via Diffusion Model
Figure 3 for Defending against Adversarial Audio via Diffusion Model
Figure 4 for Defending against Adversarial Audio via Diffusion Model
Viaarxiv icon

DensePure: Understanding Diffusion Models towards Adversarial Robustness

Add code
Bookmark button
Alert button
Nov 01, 2022
Chaowei Xiao, Zhongzhu Chen, Kun Jin, Jiongxiao Wang, Weili Nie, Mingyan Liu, Anima Anandkumar, Bo Li, Dawn Song

Figure 1 for DensePure: Understanding Diffusion Models towards Adversarial Robustness
Figure 2 for DensePure: Understanding Diffusion Models towards Adversarial Robustness
Figure 3 for DensePure: Understanding Diffusion Models towards Adversarial Robustness
Figure 4 for DensePure: Understanding Diffusion Models towards Adversarial Robustness
Viaarxiv icon