Picture for Saining Xie

Saining Xie

SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

Add code
Jan 16, 2024
Viaarxiv icon

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

Add code
Jan 11, 2024
Figure 1 for Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Figure 2 for Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Figure 3 for Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Figure 4 for Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Viaarxiv icon

Image Sculpting: Precise Object Editing with 3D Geometry Control

Add code
Jan 02, 2024
Viaarxiv icon

V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs

Add code
Dec 26, 2023
Figure 1 for V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Figure 2 for V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Figure 3 for V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Figure 4 for V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Viaarxiv icon

Demystifying CLIP Data

Add code
Oct 02, 2023
Figure 1 for Demystifying CLIP Data
Figure 2 for Demystifying CLIP Data
Figure 3 for Demystifying CLIP Data
Figure 4 for Demystifying CLIP Data
Viaarxiv icon

Going Denser with Open-Vocabulary Part Segmentation

Add code
May 18, 2023
Viaarxiv icon

CiT: Curation in Training for Effective Vision-Language Data

Add code
Jan 05, 2023
Figure 1 for CiT: Curation in Training for Effective Vision-Language Data
Figure 2 for CiT: Curation in Training for Effective Vision-Language Data
Figure 3 for CiT: Curation in Training for Effective Vision-Language Data
Figure 4 for CiT: Curation in Training for Effective Vision-Language Data
Viaarxiv icon

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

Add code
Jan 02, 2023
Viaarxiv icon

Scalable Diffusion Models with Transformers

Add code
Dec 19, 2022
Viaarxiv icon

Exploring Long-Sequence Masked Autoencoders

Add code
Oct 13, 2022
Figure 1 for Exploring Long-Sequence Masked Autoencoders
Figure 2 for Exploring Long-Sequence Masked Autoencoders
Figure 3 for Exploring Long-Sequence Masked Autoencoders
Figure 4 for Exploring Long-Sequence Masked Autoencoders
Viaarxiv icon