Picture for Shuming Ma

Shuming Ma

Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

Add code
Jul 15, 2024
Figure 1 for Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
Figure 2 for Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
Figure 3 for Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
Figure 4 for Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
Viaarxiv icon

You Only Cache Once: Decoder-Decoder Architectures for Language Models

Add code
May 08, 2024
Viaarxiv icon

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Add code
Feb 27, 2024
Viaarxiv icon

When an Image is Worth 1,024 x 1,024 Words: A Case Study in Computational Pathology

Add code
Dec 06, 2023
Viaarxiv icon

Auto-ICL: In-Context Learning without Human Supervision

Add code
Nov 15, 2023
Figure 1 for Auto-ICL: In-Context Learning without Human Supervision
Figure 2 for Auto-ICL: In-Context Learning without Human Supervision
Figure 3 for Auto-ICL: In-Context Learning without Human Supervision
Figure 4 for Auto-ICL: In-Context Learning without Human Supervision
Viaarxiv icon

BitNet: Scaling 1-bit Transformers for Large Language Models

Add code
Oct 17, 2023
Viaarxiv icon

Kosmos-2.5: A Multimodal Literate Model

Add code
Sep 20, 2023
Figure 1 for Kosmos-2.5: A Multimodal Literate Model
Figure 2 for Kosmos-2.5: A Multimodal Literate Model
Figure 3 for Kosmos-2.5: A Multimodal Literate Model
Figure 4 for Kosmos-2.5: A Multimodal Literate Model
Viaarxiv icon

Retentive Network: A Successor to Transformer for Large Language Models

Add code
Aug 09, 2023
Viaarxiv icon

LongNet: Scaling Transformers to 1,000,000,000 Tokens

Add code
Jul 19, 2023
Viaarxiv icon

Kosmos-2: Grounding Multimodal Large Language Models to the World

Add code
Jul 13, 2023
Figure 1 for Kosmos-2: Grounding Multimodal Large Language Models to the World
Figure 2 for Kosmos-2: Grounding Multimodal Large Language Models to the World
Figure 3 for Kosmos-2: Grounding Multimodal Large Language Models to the World
Figure 4 for Kosmos-2: Grounding Multimodal Large Language Models to the World
Viaarxiv icon