Picture for Haodong Duan

Haodong Duan

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

Add code
Jul 16, 2024
Viaarxiv icon

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Add code
Jul 03, 2024
Figure 1 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 2 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 3 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 4 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Viaarxiv icon

MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

Add code
Jun 25, 2024
Figure 1 for MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
Figure 2 for MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
Figure 3 for MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
Figure 4 for MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
Viaarxiv icon

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs

Add code
Jun 20, 2024
Figure 1 for Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Figure 2 for Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Figure 3 for Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Figure 4 for Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Viaarxiv icon

MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding

Add code
Jun 20, 2024
Figure 1 for MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
Figure 2 for MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
Figure 3 for MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
Figure 4 for MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding
Viaarxiv icon

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Add code
Jun 06, 2024
Figure 1 for ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Figure 2 for ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Figure 3 for ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Figure 4 for ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Viaarxiv icon

MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark

Add code
May 20, 2024
Viaarxiv icon

Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks

Add code
Apr 10, 2024
Viaarxiv icon

Are We on the Right Way for Evaluating Large Vision-Language Models?

Add code
Apr 09, 2024
Figure 1 for Are We on the Right Way for Evaluating Large Vision-Language Models?
Figure 2 for Are We on the Right Way for Evaluating Large Vision-Language Models?
Figure 3 for Are We on the Right Way for Evaluating Large Vision-Language Models?
Figure 4 for Are We on the Right Way for Evaluating Large Vision-Language Models?
Viaarxiv icon

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

Add code
Apr 09, 2024
Figure 1 for InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Figure 2 for InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Figure 3 for InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Figure 4 for InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Viaarxiv icon