Picture for Felix Hofstätter

Felix Hofstätter

Probing Evaluation Awareness of Language Models

Add code
Jul 02, 2025
Viaarxiv icon

The Elicitation Game: Evaluating Capability Elicitation Techniques

Add code
Feb 04, 2025
Figure 1 for The Elicitation Game: Evaluating Capability Elicitation Techniques
Figure 2 for The Elicitation Game: Evaluating Capability Elicitation Techniques
Figure 3 for The Elicitation Game: Evaluating Capability Elicitation Techniques
Figure 4 for The Elicitation Game: Evaluating Capability Elicitation Techniques
Viaarxiv icon

Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models

Add code
Dec 02, 2024
Figure 1 for Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models
Figure 2 for Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models
Figure 3 for Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models
Figure 4 for Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models
Viaarxiv icon

AI Sandbagging: Language Models can Strategically Underperform on Evaluations

Add code
Jun 12, 2024
Figure 1 for AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Figure 2 for AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Figure 3 for AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Figure 4 for AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Viaarxiv icon