Picture for Weigao Sun

Weigao Sun

Scaling Laws for Linear Complexity Language Models

Add code
Jun 24, 2024
Figure 1 for Scaling Laws for Linear Complexity Language Models
Figure 2 for Scaling Laws for Linear Complexity Language Models
Figure 3 for Scaling Laws for Linear Complexity Language Models
Figure 4 for Scaling Laws for Linear Complexity Language Models
Viaarxiv icon

Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention

Add code
May 27, 2024
Figure 1 for Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention
Figure 2 for Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention
Figure 3 for Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention
Figure 4 for Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention
Viaarxiv icon

Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective

Add code
May 27, 2024
Figure 1 for Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective
Figure 2 for Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective
Figure 3 for Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective
Figure 4 for Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective
Viaarxiv icon

HGRN2: Gated Linear RNNs with State Expansion

Add code
Apr 11, 2024
Viaarxiv icon

Linear Attention Sequence Parallelism

Add code
Apr 03, 2024
Figure 1 for Linear Attention Sequence Parallelism
Figure 2 for Linear Attention Sequence Parallelism
Figure 3 for Linear Attention Sequence Parallelism
Figure 4 for Linear Attention Sequence Parallelism
Viaarxiv icon

MS-Net: A Multi-Path Sparse Model for Motion Prediction in Multi-Scenes

Add code
Mar 01, 2024
Figure 1 for MS-Net: A Multi-Path Sparse Model for Motion Prediction in Multi-Scenes
Figure 2 for MS-Net: A Multi-Path Sparse Model for Motion Prediction in Multi-Scenes
Figure 3 for MS-Net: A Multi-Path Sparse Model for Motion Prediction in Multi-Scenes
Figure 4 for MS-Net: A Multi-Path Sparse Model for Motion Prediction in Multi-Scenes
Viaarxiv icon

CO2: Efficient Distributed Training with Full Communication-Computation Overlap

Add code
Jan 29, 2024
Figure 1 for CO2: Efficient Distributed Training with Full Communication-Computation Overlap
Figure 2 for CO2: Efficient Distributed Training with Full Communication-Computation Overlap
Figure 3 for CO2: Efficient Distributed Training with Full Communication-Computation Overlap
Figure 4 for CO2: Efficient Distributed Training with Full Communication-Computation Overlap
Viaarxiv icon

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Add code
Jan 15, 2024
Figure 1 for Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Figure 2 for Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Figure 3 for Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Figure 4 for Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
Viaarxiv icon

Scaling TransNormer to 175 Billion Parameters

Add code
Jul 27, 2023
Figure 1 for Scaling TransNormer to 175 Billion Parameters
Figure 2 for Scaling TransNormer to 175 Billion Parameters
Figure 3 for Scaling TransNormer to 175 Billion Parameters
Figure 4 for Scaling TransNormer to 175 Billion Parameters
Viaarxiv icon