Picture for Yin Cui

Yin Cui

Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation

Add code
Apr 30, 2024
Figure 1 for Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Figure 2 for Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Figure 3 for Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Figure 4 for Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Viaarxiv icon

Module-wise Adaptive Distillation for Multimodality Foundation Models

Add code
Oct 06, 2023
Figure 1 for Module-wise Adaptive Distillation for Multimodality Foundation Models
Figure 2 for Module-wise Adaptive Distillation for Multimodality Foundation Models
Figure 3 for Module-wise Adaptive Distillation for Multimodality Foundation Models
Figure 4 for Module-wise Adaptive Distillation for Multimodality Foundation Models
Viaarxiv icon

VideoGLUE: Video General Understanding Evaluation of Foundation Models

Add code
Jul 06, 2023
Figure 1 for VideoGLUE: Video General Understanding Evaluation of Foundation Models
Figure 2 for VideoGLUE: Video General Understanding Evaluation of Foundation Models
Figure 3 for VideoGLUE: Video General Understanding Evaluation of Foundation Models
Figure 4 for VideoGLUE: Video General Understanding Evaluation of Foundation Models
Viaarxiv icon

DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model

Add code
Jun 02, 2023
Figure 1 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Figure 2 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Figure 3 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Figure 4 for DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Viaarxiv icon

Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception

Add code
May 10, 2023
Figure 1 for Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Figure 2 for Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Figure 3 for Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Figure 4 for Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Viaarxiv icon

Towards Understanding the Effect of Pretraining Label Granularity

Add code
Mar 29, 2023
Figure 1 for Towards Understanding the Effect of Pretraining Label Granularity
Figure 2 for Towards Understanding the Effect of Pretraining Label Granularity
Figure 3 for Towards Understanding the Effect of Pretraining Label Granularity
Figure 4 for Towards Understanding the Effect of Pretraining Label Granularity
Viaarxiv icon

Unified Visual Relationship Detection with Vision and Language Models

Add code
Mar 16, 2023
Figure 1 for Unified Visual Relationship Detection with Vision and Language Models
Figure 2 for Unified Visual Relationship Detection with Vision and Language Models
Figure 3 for Unified Visual Relationship Detection with Vision and Language Models
Figure 4 for Unified Visual Relationship Detection with Vision and Language Models
Viaarxiv icon

A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models

Add code
Feb 13, 2023
Figure 1 for A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
Figure 2 for A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
Figure 3 for A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
Figure 4 for A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models
Viaarxiv icon

F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models

Add code
Sep 30, 2022
Figure 1 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Figure 2 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Figure 3 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Figure 4 for F-VLM: Open-Vocabulary Object Detection upon Frozen Vision and Language Models
Viaarxiv icon

Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models

Add code
Jul 15, 2022
Figure 1 for Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Figure 2 for Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Figure 3 for Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Figure 4 for Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Viaarxiv icon