Picture for Yuval Ran-Milo

Yuval Ran-Milo

Attention Sinks Are Provably Necessary in Softmax Transformers: Evidence from Trigger-Conditional Tasks

Add code
Mar 12, 2026
Viaarxiv icon

Outcome-Based RL Provably Leads Transformers to Reason, but Only With the Right Data

Add code
Jan 21, 2026
Viaarxiv icon

Do Neural Networks Need Gradient Descent to Generalize? A Theoretical Study

Add code
Jun 04, 2025
Figure 1 for Do Neural Networks Need Gradient Descent to Generalize? A Theoretical Study
Figure 2 for Do Neural Networks Need Gradient Descent to Generalize? A Theoretical Study
Figure 3 for Do Neural Networks Need Gradient Descent to Generalize? A Theoretical Study
Figure 4 for Do Neural Networks Need Gradient Descent to Generalize? A Theoretical Study
Viaarxiv icon

Mamba Knockout for Unraveling Factual Information Flow

Add code
May 30, 2025
Figure 1 for Mamba Knockout for Unraveling Factual Information Flow
Figure 2 for Mamba Knockout for Unraveling Factual Information Flow
Figure 3 for Mamba Knockout for Unraveling Factual Information Flow
Figure 4 for Mamba Knockout for Unraveling Factual Information Flow
Viaarxiv icon

Provable Benefits of Complex Parameterizations for Structured State Space Models

Add code
Oct 17, 2024
Figure 1 for Provable Benefits of Complex Parameterizations for Structured State Space Models
Figure 2 for Provable Benefits of Complex Parameterizations for Structured State Space Models
Figure 3 for Provable Benefits of Complex Parameterizations for Structured State Space Models
Figure 4 for Provable Benefits of Complex Parameterizations for Structured State Space Models
Viaarxiv icon