Picture for Alex Serrano

Alex Serrano

Frontier Models Can Take Actions at Low Probabilities

Add code
Mar 02, 2026
Viaarxiv icon

Neural Chameleons: Language Models Can Learn to Hide Their Thoughts from Unseen Activation Monitors

Add code
Dec 12, 2025
Viaarxiv icon

Obfuscated Activations Bypass LLM Latent-Space Defenses

Add code
Dec 12, 2024
Viaarxiv icon