Picture for Ekdeep Singh Lubana

Ekdeep Singh Lubana

Detecting High-Stakes Interactions with Activation Probes

Add code
Jun 12, 2025
Viaarxiv icon

Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit

Add code
Jun 05, 2025
Viaarxiv icon

Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry

Add code
Mar 03, 2025
Viaarxiv icon

Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models

Add code
Feb 18, 2025
Viaarxiv icon

ICLR: In-Context Learning of Representations

Add code
Dec 29, 2024
Figure 1 for ICLR: In-Context Learning of Representations
Figure 2 for ICLR: In-Context Learning of Representations
Figure 3 for ICLR: In-Context Learning of Representations
Figure 4 for ICLR: In-Context Learning of Representations
Viaarxiv icon

Competition Dynamics Shape Algorithmic Phases of In-Context Learning

Add code
Dec 01, 2024
Figure 1 for Competition Dynamics Shape Algorithmic Phases of In-Context Learning
Figure 2 for Competition Dynamics Shape Algorithmic Phases of In-Context Learning
Figure 3 for Competition Dynamics Shape Algorithmic Phases of In-Context Learning
Figure 4 for Competition Dynamics Shape Algorithmic Phases of In-Context Learning
Viaarxiv icon

Abrupt Learning in Transformers: A Case Study on Matrix Completion

Add code
Oct 29, 2024
Viaarxiv icon

Towards Reliable Evaluation of Behavior Steering Interventions in LLMs

Add code
Oct 22, 2024
Viaarxiv icon

Representation Shattering in Transformers: A Synthetic Study with Knowledge Editing

Add code
Oct 22, 2024
Viaarxiv icon

Analyzing (In)Abilities of SAEs via Formal Languages

Add code
Oct 15, 2024
Viaarxiv icon