Picture for Jiongxiao Wang

Jiongxiao Wang

Reinforcement Learning for Self-Improving Agent with Skill Library

Add code
Dec 18, 2025
Figure 1 for Reinforcement Learning for Self-Improving Agent with Skill Library
Figure 2 for Reinforcement Learning for Self-Improving Agent with Skill Library
Figure 3 for Reinforcement Learning for Self-Improving Agent with Skill Library
Figure 4 for Reinforcement Learning for Self-Improving Agent with Skill Library
Viaarxiv icon

Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset

Add code
Nov 05, 2024
Figure 1 for Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Figure 2 for Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Figure 3 for Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Figure 4 for Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Viaarxiv icon

FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks

Add code
Oct 28, 2024
Figure 1 for FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks
Figure 2 for FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks
Figure 3 for FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks
Figure 4 for FATH: Authentication-based Test-time Defense against Indirect Prompt Injection Attacks
Viaarxiv icon

Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness

Add code
Jun 30, 2024
Figure 1 for Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness
Figure 2 for Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness
Figure 3 for Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness
Figure 4 for Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness
Viaarxiv icon

Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors

Add code
May 17, 2024
Figure 1 for Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors
Figure 2 for Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors
Figure 3 for Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors
Figure 4 for Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors
Viaarxiv icon

Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment

Add code
Feb 27, 2024
Figure 1 for Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment
Figure 2 for Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment
Figure 3 for Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment
Figure 4 for Mitigating Fine-tuning Jailbreak Attack with Backdoor Enhanced Alignment
Viaarxiv icon

Preference Poisoning Attacks on Reward Model Learning

Add code
Feb 02, 2024
Figure 1 for Preference Poisoning Attacks on Reward Model Learning
Figure 2 for Preference Poisoning Attacks on Reward Model Learning
Figure 3 for Preference Poisoning Attacks on Reward Model Learning
Figure 4 for Preference Poisoning Attacks on Reward Model Learning
Viaarxiv icon

Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations

Add code
Nov 16, 2023
Figure 1 for Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations
Figure 2 for Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations
Figure 3 for Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations
Figure 4 for Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations
Viaarxiv icon

On the Exploitability of Reinforcement Learning with Human Feedback for Large Language Models

Add code
Nov 16, 2023
Figure 1 for On the Exploitability of Reinforcement Learning with Human Feedback for Large Language Models
Figure 2 for On the Exploitability of Reinforcement Learning with Human Feedback for Large Language Models
Figure 3 for On the Exploitability of Reinforcement Learning with Human Feedback for Large Language Models
Figure 4 for On the Exploitability of Reinforcement Learning with Human Feedback for Large Language Models
Viaarxiv icon

On the Exploitability of Instruction Tuning

Add code
Jun 28, 2023
Figure 1 for On the Exploitability of Instruction Tuning
Figure 2 for On the Exploitability of Instruction Tuning
Figure 3 for On the Exploitability of Instruction Tuning
Figure 4 for On the Exploitability of Instruction Tuning
Viaarxiv icon