Picture for Songlin Yang

Songlin Yang

PaTH Attention: Position Encoding via Accumulating Householder Transformations

Add code
May 22, 2025
Viaarxiv icon

Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

Add code
May 10, 2025
Viaarxiv icon

Inductive Spatio-Temporal Kriging with Physics-Guided Increment Training Strategy for Air Quality Inference

Add code
Mar 12, 2025
Viaarxiv icon

Textured 3D Regenerative Morphing with 3D Diffusion Prior

Add code
Feb 20, 2025
Viaarxiv icon

ARFlow: Autogressive Flow with Hybrid Linear Attention

Add code
Jan 27, 2025
Figure 1 for ARFlow: Autogressive Flow with Hybrid Linear Attention
Figure 2 for ARFlow: Autogressive Flow with Hybrid Linear Attention
Figure 3 for ARFlow: Autogressive Flow with Hybrid Linear Attention
Figure 4 for ARFlow: Autogressive Flow with Hybrid Linear Attention
Viaarxiv icon

Gated Delta Networks: Improving Mamba2 with Delta Rule

Add code
Dec 09, 2024
Viaarxiv icon

Stick-breaking Attention

Add code
Oct 23, 2024
Figure 1 for Stick-breaking Attention
Figure 2 for Stick-breaking Attention
Figure 3 for Stick-breaking Attention
Figure 4 for Stick-breaking Attention
Viaarxiv icon

A Controlled Study on Long Context Extension and Generalization in LLMs

Add code
Sep 18, 2024
Figure 1 for A Controlled Study on Long Context Extension and Generalization in LLMs
Figure 2 for A Controlled Study on Long Context Extension and Generalization in LLMs
Figure 3 for A Controlled Study on Long Context Extension and Generalization in LLMs
Figure 4 for A Controlled Study on Long Context Extension and Generalization in LLMs
Viaarxiv icon

Gated Slot Attention for Efficient Linear-Time Sequence Modeling

Add code
Sep 11, 2024
Figure 1 for Gated Slot Attention for Efficient Linear-Time Sequence Modeling
Figure 2 for Gated Slot Attention for Efficient Linear-Time Sequence Modeling
Figure 3 for Gated Slot Attention for Efficient Linear-Time Sequence Modeling
Figure 4 for Gated Slot Attention for Efficient Linear-Time Sequence Modeling
Viaarxiv icon

Parallelizing Linear Transformers with the Delta Rule over Sequence Length

Add code
Jun 10, 2024
Figure 1 for Parallelizing Linear Transformers with the Delta Rule over Sequence Length
Figure 2 for Parallelizing Linear Transformers with the Delta Rule over Sequence Length
Figure 3 for Parallelizing Linear Transformers with the Delta Rule over Sequence Length
Figure 4 for Parallelizing Linear Transformers with the Delta Rule over Sequence Length
Viaarxiv icon