Picture for Xuanli He

Xuanli He

Segment-Level Coherence for Robust Harmful Intent Probing in LLMs

Add code
Apr 16, 2026
Viaarxiv icon

Trojan-Speak: Bypassing Constitutional Classifiers with No Jailbreak Tax via Adversarial Finetuning

Add code
Mar 30, 2026
Viaarxiv icon

PiCSAR: Probabilistic Confidence Selection And Ranking

Add code
Aug 29, 2025
Viaarxiv icon

GRADA: Graph-based Reranker against Adversarial Documents Attack

Add code
May 12, 2025
Figure 1 for GRADA: Graph-based Reranker against Adversarial Documents Attack
Figure 2 for GRADA: Graph-based Reranker against Adversarial Documents Attack
Figure 3 for GRADA: Graph-based Reranker against Adversarial Documents Attack
Figure 4 for GRADA: Graph-based Reranker against Adversarial Documents Attack
Viaarxiv icon

Defending Deep Neural Networks against Backdoor Attacks via Module Switching

Add code
Apr 08, 2025
Viaarxiv icon

Self-Training Large Language Models for Tool-Use Without Demonstrations

Add code
Feb 09, 2025
Viaarxiv icon

Cut the Deadwood Out: Post-Training Model Purification with Selective Module Substitution

Add code
Dec 29, 2024
Viaarxiv icon

An Auditing Test To Detect Behavioral Shift in Language Models

Add code
Oct 25, 2024
Figure 1 for An Auditing Test To Detect Behavioral Shift in Language Models
Figure 2 for An Auditing Test To Detect Behavioral Shift in Language Models
Figure 3 for An Auditing Test To Detect Behavioral Shift in Language Models
Figure 4 for An Auditing Test To Detect Behavioral Shift in Language Models
Viaarxiv icon

Analysing the Residual Stream of Language Models Under Knowledge Conflicts

Add code
Oct 21, 2024
Figure 1 for Analysing the Residual Stream of Language Models Under Knowledge Conflicts
Figure 2 for Analysing the Residual Stream of Language Models Under Knowledge Conflicts
Figure 3 for Analysing the Residual Stream of Language Models Under Knowledge Conflicts
Figure 4 for Analysing the Residual Stream of Language Models Under Knowledge Conflicts
Viaarxiv icon

Are We Done with MMLU?

Add code
Jun 07, 2024
Figure 1 for Are We Done with MMLU?
Figure 2 for Are We Done with MMLU?
Figure 3 for Are We Done with MMLU?
Figure 4 for Are We Done with MMLU?
Viaarxiv icon