Picture for Yanghao Su

Yanghao Su

Character as a Latent Variable in Large Language Models: A Mechanistic Account of Emergent Misalignment and Conditional Safety Failures

Add code
Jan 30, 2026
Viaarxiv icon

Model X-ray:Detect Backdoored Models via Decision Boundary

Add code
Feb 27, 2024
Figure 1 for Model X-ray:Detect Backdoored Models via Decision Boundary
Figure 2 for Model X-ray:Detect Backdoored Models via Decision Boundary
Figure 3 for Model X-ray:Detect Backdoored Models via Decision Boundary
Figure 4 for Model X-ray:Detect Backdoored Models via Decision Boundary
Viaarxiv icon