Picture for Wenhui Wang

Wenhui Wang

When an Image is Worth 1,024 x 1,024 Words: A Case Study in Computational Pathology

Add code
Dec 06, 2023
Viaarxiv icon

Kosmos-2.5: A Multimodal Literate Model

Add code
Sep 20, 2023
Figure 1 for Kosmos-2.5: A Multimodal Literate Model
Figure 2 for Kosmos-2.5: A Multimodal Literate Model
Figure 3 for Kosmos-2.5: A Multimodal Literate Model
Figure 4 for Kosmos-2.5: A Multimodal Literate Model
Viaarxiv icon

LongNet: Scaling Transformers to 1,000,000,000 Tokens

Add code
Jul 19, 2023
Viaarxiv icon

Kosmos-2: Grounding Multimodal Large Language Models to the World

Add code
Jul 13, 2023
Figure 1 for Kosmos-2: Grounding Multimodal Large Language Models to the World
Figure 2 for Kosmos-2: Grounding Multimodal Large Language Models to the World
Figure 3 for Kosmos-2: Grounding Multimodal Large Language Models to the World
Figure 4 for Kosmos-2: Grounding Multimodal Large Language Models to the World
Viaarxiv icon

Language Is Not All You Need: Aligning Perception with Language Models

Add code
Mar 01, 2023
Figure 1 for Language Is Not All You Need: Aligning Perception with Language Models
Figure 2 for Language Is Not All You Need: Aligning Perception with Language Models
Figure 3 for Language Is Not All You Need: Aligning Perception with Language Models
Figure 4 for Language Is Not All You Need: Aligning Perception with Language Models
Viaarxiv icon

TorchScale: Transformers at Scale

Add code
Nov 23, 2022
Viaarxiv icon

Foundation Transformers

Add code
Oct 19, 2022
Figure 1 for Foundation Transformers
Figure 2 for Foundation Transformers
Figure 3 for Foundation Transformers
Figure 4 for Foundation Transformers
Viaarxiv icon

Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks

Add code
Aug 31, 2022
Figure 1 for Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Figure 2 for Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Figure 3 for Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Figure 4 for Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Viaarxiv icon

Language Models are General-Purpose Interfaces

Add code
Jun 13, 2022
Figure 1 for Language Models are General-Purpose Interfaces
Figure 2 for Language Models are General-Purpose Interfaces
Figure 3 for Language Models are General-Purpose Interfaces
Figure 4 for Language Models are General-Purpose Interfaces
Viaarxiv icon

VL-BEiT: Generative Vision-Language Pretraining

Add code
Jun 02, 2022
Figure 1 for VL-BEiT: Generative Vision-Language Pretraining
Figure 2 for VL-BEiT: Generative Vision-Language Pretraining
Figure 3 for VL-BEiT: Generative Vision-Language Pretraining
Figure 4 for VL-BEiT: Generative Vision-Language Pretraining
Viaarxiv icon