Picture for Martín Soto

Martín Soto

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

Add code
Jul 15, 2025
Viaarxiv icon

The Singapore Consensus on Global AI Safety Research Priorities

Add code
Jun 25, 2025
Viaarxiv icon

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Add code
Feb 25, 2025
Viaarxiv icon

Tell me about yourself: LLMs are aware of their learned behaviors

Add code
Jan 19, 2025
Viaarxiv icon