Picture for Martín Soto

Martín Soto

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

Add code
Jul 15, 2025
Viaarxiv icon

The Singapore Consensus on Global AI Safety Research Priorities

Add code
Jun 25, 2025
Figure 1 for The Singapore Consensus on Global AI Safety Research Priorities
Figure 2 for The Singapore Consensus on Global AI Safety Research Priorities
Figure 3 for The Singapore Consensus on Global AI Safety Research Priorities
Viaarxiv icon

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Add code
Feb 25, 2025
Viaarxiv icon

Tell me about yourself: LLMs are aware of their learned behaviors

Add code
Jan 19, 2025
Viaarxiv icon