Picture for Jason D. Lee

Jason D. Lee

What One Cannot, Two Can: Two-Layer Transformers Provably Represent Induction Heads on Any-Order Markov Chains

Add code
Aug 10, 2025
Viaarxiv icon

The Generative Leap: Sharp Sample Complexity for Efficiently Learning Gaussian Multi-Index Models

Add code
Jun 05, 2025
Viaarxiv icon

Learning Compositional Functions with Transformers from Easy-to-Hard Data

Add code
May 29, 2025
Viaarxiv icon

Accelerating RL for LLM Reasoning with Optimal Advantage Regression

Add code
May 27, 2025
Viaarxiv icon

Emergence and scaling laws in SGD learning of shallow neural networks

Add code
Apr 28, 2025
Viaarxiv icon

What Makes a Reward Model a Good Teacher? An Optimization Perspective

Add code
Mar 19, 2025
Viaarxiv icon

Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought

Add code
Feb 28, 2025
Viaarxiv icon

Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension

Add code
Feb 07, 2025
Figure 1 for Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension
Figure 2 for Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension
Figure 3 for Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension
Figure 4 for Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension
Viaarxiv icon

Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding

Add code
Jan 01, 2025
Figure 1 for Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding
Figure 2 for Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding
Figure 3 for Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding
Figure 4 for Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding
Viaarxiv icon

Understanding Factual Recall in Transformers via Associative Memories

Add code
Dec 09, 2024
Figure 1 for Understanding Factual Recall in Transformers via Associative Memories
Figure 2 for Understanding Factual Recall in Transformers via Associative Memories
Figure 3 for Understanding Factual Recall in Transformers via Associative Memories
Figure 4 for Understanding Factual Recall in Transformers via Associative Memories
Viaarxiv icon