Picture for Yin Cui

Yin Cui

DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model

Add code
Jun 02, 2023
Figure 1 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Figure 2 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Figure 3 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Figure 4 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Viaarxiv icon

Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception

Add code
May 10, 2023
Figure 1 for Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Figure 2 for Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Figure 3 for Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Figure 4 for Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Viaarxiv icon

Towards Understanding the Effect of Pretraining Label Granularity

Add code
Mar 29, 2023
Viaarxiv icon

Unified Visual Relationship Detection with Vision and Language Models

Add code
Mar 16, 2023
Viaarxiv icon

A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models

Add code
Feb 13, 2023
Figure 1 for A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
Figure 2 for A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
Figure 3 for A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
Figure 4 for A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
Viaarxiv icon

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

Add code
Sep 30, 2022
Figure 1 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Figure 2 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Figure 3 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Figure 4 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Viaarxiv icon

Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models

Add code
Jul 15, 2022
Figure 1 for Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Figure 2 for Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Figure 3 for Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Figure 4 for Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Viaarxiv icon

Surrogate Gap Minimization Improves Sharpness-Aware Training

Add code
Mar 19, 2022
Figure 1 for Surrogate Gap Minimization Improves Sharpness-Aware Training
Figure 2 for Surrogate Gap Minimization Improves Sharpness-Aware Training
Figure 3 for Surrogate Gap Minimization Improves Sharpness-Aware Training
Figure 4 for Surrogate Gap Minimization Improves Sharpness-Aware Training
Viaarxiv icon

Open-Vocabulary Image Segmentation

Add code
Dec 22, 2021
Figure 1 for Open-Vocabulary Image Segmentation
Figure 2 for Open-Vocabulary Image Segmentation
Figure 3 for Open-Vocabulary Image Segmentation
Figure 4 for Open-Vocabulary Image Segmentation
Viaarxiv icon

Towards a Unified Foundation Model: Jointly Pre-Training Transformers on Unpaired Images and Text

Add code
Dec 14, 2021
Figure 1 for Towards a Unified Foundation Model: Jointly Pre-Training Transformers on Unpaired Images and Text
Figure 2 for Towards a Unified Foundation Model: Jointly Pre-Training Transformers on Unpaired Images and Text
Figure 3 for Towards a Unified Foundation Model: Jointly Pre-Training Transformers on Unpaired Images and Text
Figure 4 for Towards a Unified Foundation Model: Jointly Pre-Training Transformers on Unpaired Images and Text
Viaarxiv icon