Picture for Lipeng He

Lipeng He

Defending against Adaptive Prompt Injection Attacks via Reasoning-enabled Task Alignment

Add code
Jun 13, 2026
Viaarxiv icon

SoK: Colluding Adversaries in Machine Learning Pipelines

Add code
Jun 08, 2026
Viaarxiv icon

Beyond Similarity: Trustworthy Memory Search for Personal AI Agents

Add code
Jun 04, 2026
Viaarxiv icon

Backdooring Bias in Large Language Models

Add code
Feb 13, 2026
Viaarxiv icon

Understanding and Preserving Safety in Fine-Tuned LLMs

Add code
Jan 15, 2026
Viaarxiv icon

Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance

Add code
Jan 06, 2026
Viaarxiv icon

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

Add code
May 26, 2025
Figure 1 for StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
Figure 2 for StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
Figure 3 for StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
Figure 4 for StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs
Viaarxiv icon

Activation Approximations Can Incur Safety Vulnerabilities Even in Aligned LLMs: Comprehensive Analysis and Defense

Add code
Feb 02, 2025
Figure 1 for Activation Approximations Can Incur Safety Vulnerabilities Even in Aligned LLMs: Comprehensive Analysis and Defense
Figure 2 for Activation Approximations Can Incur Safety Vulnerabilities Even in Aligned LLMs: Comprehensive Analysis and Defense
Figure 3 for Activation Approximations Can Incur Safety Vulnerabilities Even in Aligned LLMs: Comprehensive Analysis and Defense
Figure 4 for Activation Approximations Can Incur Safety Vulnerabilities Even in Aligned LLMs: Comprehensive Analysis and Defense
Viaarxiv icon