Picture for Mengdan Zhang

Mengdan Zhang

VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

Add code
Jun 14, 2024
Viaarxiv icon

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Add code
May 31, 2024
Figure 1 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 2 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 3 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 4 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Viaarxiv icon

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

Add code
Apr 24, 2024
Figure 1 for Cantor: Inspiring Multimodal Chain-of-Thought of MLLM
Figure 2 for Cantor: Inspiring Multimodal Chain-of-Thought of MLLM
Figure 3 for Cantor: Inspiring Multimodal Chain-of-Thought of MLLM
Figure 4 for Cantor: Inspiring Multimodal Chain-of-Thought of MLLM
Viaarxiv icon

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

Add code
Dec 20, 2023
Viaarxiv icon

Aligning and Prompting Everything All at Once for Universal Visual Perception

Add code
Dec 04, 2023
Figure 1 for Aligning and Prompting Everything All at Once for Universal Visual Perception
Figure 2 for Aligning and Prompting Everything All at Once for Universal Visual Perception
Figure 3 for Aligning and Prompting Everything All at Once for Universal Visual Perception
Figure 4 for Aligning and Prompting Everything All at Once for Universal Visual Perception
Viaarxiv icon

Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection

Add code
Aug 30, 2023
Figure 1 for Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
Figure 2 for Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
Figure 3 for Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
Figure 4 for Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
Viaarxiv icon

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Add code
Jul 02, 2023
Figure 1 for MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Figure 2 for MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Figure 3 for MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Figure 4 for MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Viaarxiv icon

Multi-modal Queried Object Detection in the Wild

Add code
May 30, 2023
Figure 1 for Multi-modal Queried Object Detection in the Wild
Figure 2 for Multi-modal Queried Object Detection in the Wild
Figure 3 for Multi-modal Queried Object Detection in the Wild
Figure 4 for Multi-modal Queried Object Detection in the Wild
Viaarxiv icon

Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization

Add code
Jun 24, 2022
Figure 1 for Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization
Figure 2 for Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization
Figure 3 for Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization
Figure 4 for Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization
Viaarxiv icon

Efficient Decoder-free Object Detection with Transformers

Add code
Jun 17, 2022
Figure 1 for Efficient Decoder-free Object Detection with Transformers
Figure 2 for Efficient Decoder-free Object Detection with Transformers
Figure 3 for Efficient Decoder-free Object Detection with Transformers
Figure 4 for Efficient Decoder-free Object Detection with Transformers
Viaarxiv icon