Picture for Shusheng Yang

Shusheng Yang

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Add code
Jun 24, 2024
Figure 1 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 2 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 3 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Figure 4 for Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Viaarxiv icon

Qwen Technical Report

Add code
Sep 28, 2023
Figure 1 for Qwen Technical Report
Figure 2 for Qwen Technical Report
Figure 3 for Qwen Technical Report
Figure 4 for Qwen Technical Report
Viaarxiv icon

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

Add code
Sep 14, 2023
Figure 1 for Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Figure 2 for Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Figure 3 for Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Figure 4 for Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Viaarxiv icon

TouchStone: Evaluating Vision-Language Models by Language Models

Add code
Sep 04, 2023
Figure 1 for TouchStone: Evaluating Vision-Language Models by Language Models
Figure 2 for TouchStone: Evaluating Vision-Language Models by Language Models
Figure 3 for TouchStone: Evaluating Vision-Language Models by Language Models
Figure 4 for TouchStone: Evaluating Vision-Language Models by Language Models
Viaarxiv icon

ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers

Add code
May 24, 2023
Figure 1 for ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers
Figure 2 for ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers
Figure 3 for ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers
Figure 4 for ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers
Viaarxiv icon

MobileInst: Video Instance Segmentation on the Mobile

Add code
Mar 30, 2023
Figure 1 for MobileInst: Video Instance Segmentation on the Mobile
Figure 2 for MobileInst: Video Instance Segmentation on the Mobile
Figure 3 for MobileInst: Video Instance Segmentation on the Mobile
Figure 4 for MobileInst: Video Instance Segmentation on the Mobile
Viaarxiv icon

Masked Visual Reconstruction in Language Semantic Space

Add code
Jan 17, 2023
Figure 1 for Masked Visual Reconstruction in Language Semantic Space
Figure 2 for Masked Visual Reconstruction in Language Semantic Space
Figure 3 for Masked Visual Reconstruction in Language Semantic Space
Figure 4 for Masked Visual Reconstruction in Language Semantic Space
Viaarxiv icon

Masked Image Modeling with Denoising Contrast

Add code
May 19, 2022
Figure 1 for Masked Image Modeling with Denoising Contrast
Figure 2 for Masked Image Modeling with Denoising Contrast
Figure 3 for Masked Image Modeling with Denoising Contrast
Figure 4 for Masked Image Modeling with Denoising Contrast
Viaarxiv icon

Temporally Efficient Vision Transformer for Video Instance Segmentation

Add code
Apr 18, 2022
Figure 1 for Temporally Efficient Vision Transformer for Video Instance Segmentation
Figure 2 for Temporally Efficient Vision Transformer for Video Instance Segmentation
Figure 3 for Temporally Efficient Vision Transformer for Video Instance Segmentation
Figure 4 for Temporally Efficient Vision Transformer for Video Instance Segmentation
Viaarxiv icon

Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection

Add code
Apr 06, 2022
Figure 1 for Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
Figure 2 for Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
Figure 3 for Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
Figure 4 for Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
Viaarxiv icon