Picture for Siquan Li

Siquan Li

The Structural Origin of Attention Sink: Variance Discrepancy, Super Neurons, and Dimension Disparity

Add code
May 07, 2026
Viaarxiv icon

Transformers Are Born Biased: Structural Inductive Biases at Random Initialization and Their Practical Consequences

Add code
Feb 05, 2026
Viaarxiv icon