Picture for Huiyu Xu

Huiyu Xu

LoopTrap: Termination Poisoning Attacks on LLM Agents

Add code
May 07, 2026
Viaarxiv icon

Towards LLM Guardrails via Sparse Representation Steering

Add code
Mar 21, 2025
Viaarxiv icon

Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation

Add code
Mar 09, 2025
Figure 1 for Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation
Figure 2 for Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation
Figure 3 for Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation
Figure 4 for Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation
Viaarxiv icon

RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent

Add code
Jul 23, 2024
Figure 1 for RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent
Figure 2 for RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent
Figure 3 for RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent
Figure 4 for RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent
Viaarxiv icon