Picture for Ekdeep Singh Lubana

Ekdeep Singh Lubana

What Makes and Breaks Safety Fine-tuning? A Mechanistic Study

Add code
Jul 16, 2024
Viaarxiv icon

What Makes and Breaks Safety Fine-tuning? Mechanistic Study

Add code
Jul 14, 2024
Viaarxiv icon

Emergence of Hidden Capabilities: Exploring Learning Dynamics in Concept Space

Add code
Jun 27, 2024
Viaarxiv icon

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Apr 15, 2024
Viaarxiv icon

Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model

Add code
Feb 12, 2024
Viaarxiv icon

FoMo Rewards: Can we cast foundation models as reward functions?

Add code
Dec 06, 2023
Viaarxiv icon

How Capable Can a Transformer Become? A Study on Synthetic, Interpretable Tasks

Add code
Nov 21, 2023
Viaarxiv icon

Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

Add code
Nov 21, 2023
Viaarxiv icon

In-Context Learning Dynamics with Random Binary Sequences

Add code
Oct 26, 2023
Figure 1 for In-Context Learning Dynamics with Random Binary Sequences
Figure 2 for In-Context Learning Dynamics with Random Binary Sequences
Figure 3 for In-Context Learning Dynamics with Random Binary Sequences
Figure 4 for In-Context Learning Dynamics with Random Binary Sequences
Viaarxiv icon

Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task

Add code
Oct 13, 2023
Figure 1 for Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
Figure 2 for Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
Figure 3 for Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
Figure 4 for Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
Viaarxiv icon