Picture for Jiongxiao Wang

Jiongxiao Wang

Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness

Add code
Jun 30, 2024
Viaarxiv icon

Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors

Add code
May 17, 2024
Viaarxiv icon

Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment

Add code
Feb 27, 2024
Figure 1 for Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment
Figure 2 for Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment
Figure 3 for Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment
Figure 4 for Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment
Viaarxiv icon

Preference Poisoning Attacks on Reward Model Learning

Add code
Feb 02, 2024
Viaarxiv icon

Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations

Add code
Nov 16, 2023
Viaarxiv icon

On the Exploitability of Reinforcement Learning with Human Feedback for Large Language Models

Add code
Nov 16, 2023
Viaarxiv icon

On the Exploitability of Instruction Tuning

Add code
Jun 28, 2023
Figure 1 for On the Exploitability of Instruction Tuning
Figure 2 for On the Exploitability of Instruction Tuning
Figure 3 for On the Exploitability of Instruction Tuning
Figure 4 for On the Exploitability of Instruction Tuning
Viaarxiv icon

ChatGPT-powered Conversational Drug Editing Using Retrieval and Domain Feedback

Add code
May 29, 2023
Figure 1 for ChatGPT-powered Conversational Drug Editing Using Retrieval and Domain Feedback
Figure 2 for ChatGPT-powered Conversational Drug Editing Using Retrieval and Domain Feedback
Figure 3 for ChatGPT-powered Conversational Drug Editing Using Retrieval and Domain Feedback
Figure 4 for ChatGPT-powered Conversational Drug Editing Using Retrieval and Domain Feedback
Viaarxiv icon

Adversarial Demonstration Attacks on Large Language Models

Add code
May 24, 2023
Figure 1 for Adversarial Demonstration Attacks on Large Language Models
Figure 2 for Adversarial Demonstration Attacks on Large Language Models
Figure 3 for Adversarial Demonstration Attacks on Large Language Models
Figure 4 for Adversarial Demonstration Attacks on Large Language Models
Viaarxiv icon

Defending against Adversarial Audio via Diffusion Model

Add code
Mar 02, 2023
Figure 1 for Defending against Adversarial Audio via Diffusion Model
Figure 2 for Defending against Adversarial Audio via Diffusion Model
Figure 3 for Defending against Adversarial Audio via Diffusion Model
Figure 4 for Defending against Adversarial Audio via Diffusion Model
Viaarxiv icon