Picture for Igor Santos-Grueiro

Igor Santos-Grueiro

When Evaluation Becomes a Side Channel: Regime Leakage and Structural Mitigations for Alignment Assessment

Add code
Feb 09, 2026
Viaarxiv icon

Alignment Verifiability in Large Language Models: Normative Indistinguishability under Behavioral Evaluation

Add code
Feb 05, 2026
Viaarxiv icon