Picture for Yihang Gao

Yihang Gao

GeoNorm: Unify Pre-Norm and Post-Norm with Geodesic Optimization

Add code
Jan 29, 2026
Viaarxiv icon

GenAR: Next-Scale Autoregressive Generation for Spatial Gene Expression Prediction

Add code
Oct 05, 2025
Figure 1 for GenAR: Next-Scale Autoregressive Generation for Spatial Gene Expression Prediction
Figure 2 for GenAR: Next-Scale Autoregressive Generation for Spatial Gene Expression Prediction
Figure 3 for GenAR: Next-Scale Autoregressive Generation for Spatial Gene Expression Prediction
Figure 4 for GenAR: Next-Scale Autoregressive Generation for Spatial Gene Expression Prediction
Viaarxiv icon

SAS: Simulated Attention Score

Add code
Jul 10, 2025
Viaarxiv icon

Automatic Rank Determination for Low-Rank Adaptation via Submodular Function Maximization

Add code
Jul 02, 2025
Viaarxiv icon

Self-Adjust Softmax

Add code
Feb 25, 2025
Figure 1 for Self-Adjust Softmax
Figure 2 for Self-Adjust Softmax
Figure 3 for Self-Adjust Softmax
Figure 4 for Self-Adjust Softmax
Viaarxiv icon

Low Tensor-Rank Adaptation of Kolmogorov--Arnold Networks

Add code
Feb 10, 2025
Figure 1 for Low Tensor-Rank Adaptation of Kolmogorov--Arnold Networks
Figure 2 for Low Tensor-Rank Adaptation of Kolmogorov--Arnold Networks
Figure 3 for Low Tensor-Rank Adaptation of Kolmogorov--Arnold Networks
Figure 4 for Low Tensor-Rank Adaptation of Kolmogorov--Arnold Networks
Viaarxiv icon

SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator

Add code
Dec 16, 2024
Figure 1 for SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
Figure 2 for SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
Figure 3 for SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
Figure 4 for SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
Viaarxiv icon

DAPE V2: Process Attention Score as Feature Map for Length Extrapolation

Add code
Oct 07, 2024
Figure 1 for DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Figure 2 for DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Figure 3 for DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Figure 4 for DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Viaarxiv icon

CAPE: Context-Adaptive Positional Encoding for Length Extrapolation

Add code
May 23, 2024
Figure 1 for CAPE: Context-Adaptive Positional Encoding for Length Extrapolation
Figure 2 for CAPE: Context-Adaptive Positional Encoding for Length Extrapolation
Figure 3 for CAPE: Context-Adaptive Positional Encoding for Length Extrapolation
Figure 4 for CAPE: Context-Adaptive Positional Encoding for Length Extrapolation
Viaarxiv icon

On the Expressive Power of a Variant of the Looped Transformer

Add code
Feb 21, 2024
Figure 1 for On the Expressive Power of a Variant of the Looped Transformer
Figure 2 for On the Expressive Power of a Variant of the Looped Transformer
Figure 3 for On the Expressive Power of a Variant of the Looped Transformer
Figure 4 for On the Expressive Power of a Variant of the Looped Transformer
Viaarxiv icon