Picture for Yuanhan Zhang

Yuanhan Zhang

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

Add code
Jul 17, 2024
Viaarxiv icon

LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models

Add code
Jul 10, 2024
Viaarxiv icon

Long Context Transfer from Language to Vision

Add code
Jun 24, 2024
Viaarxiv icon

WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning

Add code
May 06, 2024
Viaarxiv icon

Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

Add code
Apr 02, 2024
Figure 1 for Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Figure 2 for Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Figure 3 for Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Figure 4 for Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Viaarxiv icon

VBench: Comprehensive Benchmark Suite for Video Generative Models

Add code
Nov 29, 2023
Viaarxiv icon

OtterHD: A High-Resolution Multi-modality Model

Add code
Nov 07, 2023
Figure 1 for OtterHD: A High-Resolution Multi-modality Model
Figure 2 for OtterHD: A High-Resolution Multi-modality Model
Figure 3 for OtterHD: A High-Resolution Multi-modality Model
Figure 4 for OtterHD: A High-Resolution Multi-modality Model
Viaarxiv icon

Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images

Add code
Nov 02, 2023
Figure 1 for Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images
Figure 2 for Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images
Figure 3 for Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images
Figure 4 for Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images
Viaarxiv icon

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Add code
Oct 12, 2023
Figure 1 for Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Figure 2 for Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Figure 3 for Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Figure 4 for Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Viaarxiv icon

MMBench: Is Your Multi-modal Model an All-around Player?

Add code
Jul 26, 2023
Figure 1 for MMBench: Is Your Multi-modal Model an All-around Player?
Figure 2 for MMBench: Is Your Multi-modal Model an All-around Player?
Figure 3 for MMBench: Is Your Multi-modal Model an All-around Player?
Figure 4 for MMBench: Is Your Multi-modal Model an All-around Player?
Viaarxiv icon