Picture for Chaoyou Fu

Chaoyou Fu

VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Models

Add code
Jun 14, 2024
Viaarxiv icon

Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

Add code
Jun 12, 2024
Viaarxiv icon

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Add code
May 31, 2024
Figure 1 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 2 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 3 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Figure 4 for Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Viaarxiv icon

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

Add code
Apr 24, 2024
Figure 1 for Cantor: Inspiring Multimodal Chain-of-Thought of MLLM
Figure 2 for Cantor: Inspiring Multimodal Chain-of-Thought of MLLM
Figure 3 for Cantor: Inspiring Multimodal Chain-of-Thought of MLLM
Figure 4 for Cantor: Inspiring Multimodal Chain-of-Thought of MLLM
Viaarxiv icon

No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation

Add code
Apr 05, 2024
Viaarxiv icon

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

Add code
Dec 20, 2023
Viaarxiv icon

Aligning and Prompting Everything All at Once for Universal Visual Perception

Add code
Dec 04, 2023
Figure 1 for Aligning and Prompting Everything All at Once for Universal Visual Perception
Figure 2 for Aligning and Prompting Everything All at Once for Universal Visual Perception
Figure 3 for Aligning and Prompting Everything All at Once for Universal Visual Perception
Figure 4 for Aligning and Prompting Everything All at Once for Universal Visual Perception
Viaarxiv icon

ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model

Add code
Nov 29, 2023
Figure 1 for ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model
Figure 2 for ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model
Figure 3 for ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model
Figure 4 for ChatIllusion: Efficient-Aligning Interleaved Generation ability with Visual Instruction Model
Viaarxiv icon

Woodpecker: Hallucination Correction for Multimodal Large Language Models

Add code
Oct 24, 2023
Figure 1 for Woodpecker: Hallucination Correction for Multimodal Large Language Models
Figure 2 for Woodpecker: Hallucination Correction for Multimodal Large Language Models
Figure 3 for Woodpecker: Hallucination Correction for Multimodal Large Language Models
Figure 4 for Woodpecker: Hallucination Correction for Multimodal Large Language Models
Viaarxiv icon

CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes

Add code
Oct 15, 2023
Figure 1 for CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes
Figure 2 for CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes
Figure 3 for CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes
Figure 4 for CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes
Viaarxiv icon