Picture for Ekdeep Singh Lubana

Ekdeep Singh Lubana

The Shape of Beliefs: Geometry, Dynamics, and Interventions along Representation Manifolds of Language Models' Posteriors

Add code
Feb 02, 2026
Viaarxiv icon

From Isolation to Entanglement: When Do Interpretability Methods Identify and Disentangle Known Concepts?

Add code
Dec 17, 2025
Viaarxiv icon

Are language models aware of the road not taken? Token-level uncertainty and hidden state dynamics

Add code
Nov 06, 2025
Viaarxiv icon

How Do LLMs Persuade? Linear Probes Can Uncover Persuasion Dynamics in Multi-Turn Conversations

Add code
Aug 07, 2025
Viaarxiv icon

Uncovering Conceptual Blindspots in Generative Image Models Using Sparse Autoencoders

Add code
Jun 24, 2025
Viaarxiv icon

Detecting High-Stakes Interactions with Activation Probes

Add code
Jun 12, 2025
Viaarxiv icon

Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit

Add code
Jun 05, 2025
Figure 1 for Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit
Figure 2 for Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit
Figure 3 for Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit
Figure 4 for Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit
Viaarxiv icon

Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry

Add code
Mar 03, 2025
Figure 1 for Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
Figure 2 for Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
Figure 3 for Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
Figure 4 for Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
Viaarxiv icon

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

Add code
Feb 18, 2025
Viaarxiv icon

ICLR: In-Context Learning of Representations

Add code
Dec 29, 2024
Figure 1 for ICLR: In-Context Learning of Representations
Figure 2 for ICLR: In-Context Learning of Representations
Figure 3 for ICLR: In-Context Learning of Representations
Figure 4 for ICLR: In-Context Learning of Representations
Viaarxiv icon