Picture for Anna Soligo

Anna Soligo

Convergent Linear Representations of Emergent Misalignment

Add code
Jun 13, 2025
Viaarxiv icon

Model Organisms for Emergent Misalignment

Add code
Jun 13, 2025
Viaarxiv icon

Induced Modularity and Community Detection for Functionally Interpretable Reinforcement Learning

Add code
Jan 28, 2025
Figure 1 for Induced Modularity and Community Detection for Functionally Interpretable Reinforcement Learning
Figure 2 for Induced Modularity and Community Detection for Functionally Interpretable Reinforcement Learning
Figure 3 for Induced Modularity and Community Detection for Functionally Interpretable Reinforcement Learning
Figure 4 for Induced Modularity and Community Detection for Functionally Interpretable Reinforcement Learning
Viaarxiv icon