Picture for Jack Lindsey

Jack Lindsey

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

Add code
May 28, 2026
Viaarxiv icon

From Simulation to Enaction: Post-trained language models recognize and react to their own generations

Add code
May 25, 2026
Viaarxiv icon

Slot Machines: How LLMs Keep Track of Multiple Entities

Add code
Apr 22, 2026
Viaarxiv icon

Emotion Concepts and their Function in a Large Language Model

Add code
Apr 09, 2026
Viaarxiv icon

Mechanisms of Introspective Awareness

Add code
Mar 22, 2026
Viaarxiv icon

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

Add code
Jan 15, 2026
Viaarxiv icon

Emergent Introspective Awareness in Large Language Models

Add code
Jan 05, 2026
Viaarxiv icon

Auditing language models for hidden objectives

Add code
Mar 14, 2025
Figure 1 for Auditing language models for hidden objectives
Figure 2 for Auditing language models for hidden objectives
Figure 3 for Auditing language models for hidden objectives
Figure 4 for Auditing language models for hidden objectives
Viaarxiv icon

Open Problems in Mechanistic Interpretability

Add code
Jan 27, 2025
Figure 1 for Open Problems in Mechanistic Interpretability
Figure 2 for Open Problems in Mechanistic Interpretability
Figure 3 for Open Problems in Mechanistic Interpretability
Figure 4 for Open Problems in Mechanistic Interpretability
Viaarxiv icon

Humanity's Last Exam

Add code
Jan 24, 2025
Viaarxiv icon