Picture for Jiaqi Wang

Jiaqi Wang

VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models

Add code
Jul 16, 2024
Viaarxiv icon

Tailor3D: Customized 3D Assets Editing and Generation with Dual-Side Images

Add code
Jul 08, 2024
Viaarxiv icon

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Add code
Jul 03, 2024
Figure 1 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 2 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 3 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Figure 4 for InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Viaarxiv icon

MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

Add code
Jul 01, 2024
Figure 1 for MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
Figure 2 for MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
Figure 3 for MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
Figure 4 for MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
Viaarxiv icon

SS-Bench: A Benchmark for Social Story Generation and Evaluation

Add code
Jun 22, 2024
Figure 1 for SS-Bench: A Benchmark for Social Story Generation and Evaluation
Figure 2 for SS-Bench: A Benchmark for Social Story Generation and Evaluation
Figure 3 for SS-Bench: A Benchmark for Social Story Generation and Evaluation
Figure 4 for SS-Bench: A Benchmark for Social Story Generation and Evaluation
Viaarxiv icon

Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models

Add code
Jun 20, 2024
Viaarxiv icon

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs

Add code
Jun 20, 2024
Figure 1 for Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Figure 2 for Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Figure 3 for Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Figure 4 for Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Viaarxiv icon

Effective Generation of Feasible Solutions for Integer Programming via Guided Diffusion

Add code
Jun 18, 2024
Viaarxiv icon

V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

Add code
Jun 17, 2024
Viaarxiv icon

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

Add code
Jun 17, 2024
Figure 1 for MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
Figure 2 for MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
Figure 3 for MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
Figure 4 for MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
Viaarxiv icon