Picture for Zeqing He

Zeqing He

Towards LLM Guardrails via Sparse Representation Steering

Add code
Mar 21, 2025
Viaarxiv icon

Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation

Add code
Mar 09, 2025
Figure 1 for Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation
Figure 2 for Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation
Figure 3 for Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation
Figure 4 for Can Small Language Models Reliably Resist Jailbreak Attacks? A Comprehensive Evaluation
Viaarxiv icon