Picture for Kei Nishimura-Gasparian

Kei Nishimura-Gasparian

Towards Understanding Specification Gaming in Reasoning Models

Add code
May 04, 2026
Viaarxiv icon

An Independent Safety Evaluation of Kimi K2.5

Add code
Apr 03, 2026
Viaarxiv icon

Early Signs of Steganographic Capabilities in Frontier LLMs

Add code
Jul 03, 2025
Figure 1 for Early Signs of Steganographic Capabilities in Frontier LLMs
Figure 2 for Early Signs of Steganographic Capabilities in Frontier LLMs
Figure 3 for Early Signs of Steganographic Capabilities in Frontier LLMs
Figure 4 for Early Signs of Steganographic Capabilities in Frontier LLMs
Viaarxiv icon

Auditing language models for hidden objectives

Add code
Mar 14, 2025
Figure 1 for Auditing language models for hidden objectives
Figure 2 for Auditing language models for hidden objectives
Figure 3 for Auditing language models for hidden objectives
Figure 4 for Auditing language models for hidden objectives
Viaarxiv icon