Picture for Lei Sha

Lei Sha

HauntAttack: When Attack Follows Reasoning as a Shadow

Add code
Jun 08, 2025
Viaarxiv icon

Towards Harmonized Uncertainty Estimation for Large Language Models

Add code
May 25, 2025
Viaarxiv icon

AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement

Add code
Feb 24, 2025
Viaarxiv icon

Be a Multitude to Itself: A Prompt Evolution Framework for Red Teaming

Add code
Feb 22, 2025
Viaarxiv icon

How Far are LLMs from Being Our Digital Twins? A Benchmark for Persona-Based Behavior Chain Simulation

Add code
Feb 20, 2025
Viaarxiv icon

Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking

Add code
Feb 18, 2025
Viaarxiv icon

Plug-and-Play Training Framework for Preference Optimization

Add code
Dec 30, 2024
Viaarxiv icon

DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak

Add code
Dec 23, 2024
Figure 1 for DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
Figure 2 for DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
Figure 3 for DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
Figure 4 for DiffusionAttacker: Diffusion-Driven Prompt Manipulation for LLM Jailbreak
Viaarxiv icon

Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues

Add code
Oct 14, 2024
Figure 1 for Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues
Figure 2 for Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues
Figure 3 for Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues
Figure 4 for Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues
Viaarxiv icon

BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models

Add code
Oct 13, 2024
Figure 1 for BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models
Figure 2 for BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models
Figure 3 for BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models
Figure 4 for BlackDAN: A Black-Box Multi-Objective Approach for Effective and Contextual Jailbreaking of Large Language Models
Viaarxiv icon