Picture for Wenpo Song

Wenpo Song

VisionPangu: A Compact and Fine-Grained Multimodal Assistant with 1.7B Parameters

Add code
Mar 05, 2026
Viaarxiv icon

CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era

Add code
Mar 16, 2025
Figure 1 for CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
Figure 2 for CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
Figure 3 for CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
Figure 4 for CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
Viaarxiv icon

Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model

Add code
Aug 02, 2023
Figure 1 for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
Figure 2 for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
Figure 3 for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
Figure 4 for Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
Viaarxiv icon