Image Paragraph Captioning


Image-paragraph captioning is the process of generating descriptive paragraphs for images that contain multiple objects or scenes.

Unblocking Fine-Grained Evaluation of Detailed Captions: An Explaining AutoRater and Critic-and-Revise Pipeline

Add code
Jun 09, 2025
Viaarxiv icon

LaMP-Cap: Personalized Figure Caption Generation With Multimodal Figure Profiles

Add code
Jun 06, 2025
Viaarxiv icon

ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs

Add code
Jun 11, 2025
Viaarxiv icon

VLIS: Unimodal Language Models Guide Multimodal Language Generation

Add code
Oct 15, 2023
Figure 1 for VLIS: Unimodal Language Models Guide Multimodal Language Generation
Figure 2 for VLIS: Unimodal Language Models Guide Multimodal Language Generation
Figure 3 for VLIS: Unimodal Language Models Guide Multimodal Language Generation
Figure 4 for VLIS: Unimodal Language Models Guide Multimodal Language Generation
Viaarxiv icon

Enhancing image captioning with depth information using a Transformer-based framework

Add code
Jul 24, 2023
Viaarxiv icon

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model

Add code
Nov 30, 2023
Figure 1 for mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model
Figure 2 for mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model
Figure 3 for mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model
Figure 4 for mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model
Viaarxiv icon

SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning

Add code
Jun 06, 2023
Figure 1 for SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning
Figure 2 for SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning
Figure 3 for SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning
Figure 4 for SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning
Viaarxiv icon

KiUT: Knowledge-injected U-Transformer for Radiology Report Generation

Add code
Jun 20, 2023
Figure 1 for KiUT: Knowledge-injected U-Transformer for Radiology Report Generation
Figure 2 for KiUT: Knowledge-injected U-Transformer for Radiology Report Generation
Figure 3 for KiUT: Knowledge-injected U-Transformer for Radiology Report Generation
Figure 4 for KiUT: Knowledge-injected U-Transformer for Radiology Report Generation
Viaarxiv icon

Bypass Network for Semantics Driven Image Paragraph Captioning

Add code
Jun 21, 2022
Figure 1 for Bypass Network for Semantics Driven Image Paragraph Captioning
Figure 2 for Bypass Network for Semantics Driven Image Paragraph Captioning
Figure 3 for Bypass Network for Semantics Driven Image Paragraph Captioning
Figure 4 for Bypass Network for Semantics Driven Image Paragraph Captioning
Viaarxiv icon

Reading Radiology Imaging Like The Radiologist

Add code
Jul 20, 2023
Viaarxiv icon