Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Linear Control of Test Awareness Reveals Differential Compliance in Reasoning Models

May 20, 2025

Sahar Abdelnabi, Ahmed Salem

Figure 1 for Linear Control of Test Awareness Reveals Differential Compliance in Reasoning Models

Figure 2 for Linear Control of Test Awareness Reveals Differential Compliance in Reasoning Models

Figure 3 for Linear Control of Test Awareness Reveals Differential Compliance in Reasoning Models

Figure 4 for Linear Control of Test Awareness Reveals Differential Compliance in Reasoning Models

Share this with someone who'll enjoy it:

Abstract:Reasoning-focused large language models (LLMs) sometimes alter their behavior when they detect that they are being evaluated, an effect analogous to the Hawthorne phenomenon, which can lead them to optimize for test-passing performance or to comply more readily with harmful prompts if real-world consequences appear absent. We present the first quantitative study of how such "test awareness" impacts model behavior, particularly its safety alignment. We introduce a white-box probing framework that (i) linearly identifies awareness-related activations and (ii) steers models toward or away from test awareness while monitoring downstream performance. We apply our method to different state-of-the-art open-source reasoning LLMs across both realistic and hypothetical tasks. Our results demonstrate that test awareness significantly impact safety alignment, and is different for different models. By providing fine-grained control over this latent effect, our work aims to increase trust in how we perform safety evaluation.

View paper on

Share this with someone who'll enjoy it:

Title:Linear Control of Test Awareness Reveals Differential Compliance in Reasoning Models

Paper and Code