Picture for Pan Zhang

Pan Zhang

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

Add code
Jul 16, 2024
Viaarxiv icon

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Add code
Jul 03, 2024
Figure 1 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 2 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 3 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 4 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Viaarxiv icon

Research on target detection method of distracted driving behavior based on improved YOLOv8

Add code
Jul 02, 2024
Viaarxiv icon

MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

Add code
Jul 01, 2024
Figure 1 for MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
Figure 2 for MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
Figure 3 for MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
Figure 4 for MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
Viaarxiv icon

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

Add code
Jun 17, 2024
Figure 1 for MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
Figure 2 for MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
Figure 3 for MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
Figure 4 for MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
Viaarxiv icon

V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

Add code
Jun 17, 2024
Viaarxiv icon

MotionClone: Training-Free Motion Cloning for Controllable Video Generation

Add code
Jun 12, 2024
Viaarxiv icon

ShareGPT4Video: Improving Video Understanding and Generation with Better Captions

Add code
Jun 06, 2024
Figure 1 for ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Figure 2 for ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Figure 3 for ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Figure 4 for ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Viaarxiv icon

Bootstrap3D: Improving 3D Content Creation with Synthetic Data

Add code
May 31, 2024
Figure 1 for Bootstrap3D: Improving 3D Content Creation with Synthetic Data
Figure 2 for Bootstrap3D: Improving 3D Content Creation with Synthetic Data
Figure 3 for Bootstrap3D: Improving 3D Content Creation with Synthetic Data
Figure 4 for Bootstrap3D: Improving 3D Content Creation with Synthetic Data
Viaarxiv icon

Streaming Long Video Understanding with Large Language Models

Add code
May 25, 2024
Viaarxiv icon