Picture for Huiyu Xu

Huiyu Xu

Towards LLM Guardrails via Sparse Representation Steering

Add code
Mar 21, 2025
Viaarxiv icon

Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation

Add code
Mar 09, 2025
Viaarxiv icon

RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent

Add code
Jul 23, 2024
Figure 1 for RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent
Figure 2 for RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent
Figure 3 for RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent
Figure 4 for RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent
Viaarxiv icon