Picture for Alex Serrano

Alex Serrano

Neural Chameleons: Language Models Can Learn to Hide Their Thoughts from Unseen Activation Monitors

Add code
Dec 12, 2025
Viaarxiv icon

Obfuscated Activations Bypass LLM Latent-Space Defenses

Add code
Dec 12, 2024
Viaarxiv icon