Image Paragraph Captioning


Image-paragraph captioning is the process of generating descriptive paragraphs for images that contain multiple objects or scenes.

VLIS: Unimodal Language Models Guide Multimodal Language Generation

Add code
Oct 15, 2023
Figure 1 for VLIS: Unimodal Language Models Guide Multimodal Language Generation
Figure 2 for VLIS: Unimodal Language Models Guide Multimodal Language Generation
Figure 3 for VLIS: Unimodal Language Models Guide Multimodal Language Generation
Figure 4 for VLIS: Unimodal Language Models Guide Multimodal Language Generation
Viaarxiv icon

Enhancing image captioning with depth information using a Transformer-based framework

Add code
Jul 24, 2023
Viaarxiv icon

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model

Add code
Nov 30, 2023
Figure 1 for mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model
Figure 2 for mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model
Figure 3 for mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model
Figure 4 for mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model
Viaarxiv icon

SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning

Add code
Jun 06, 2023
Figure 1 for SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning
Figure 2 for SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning
Figure 3 for SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning
Figure 4 for SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning
Viaarxiv icon

Explore and Tell: Embodied Visual Captioning in 3D Environments

Add code
Aug 21, 2023
Figure 1 for Explore and Tell: Embodied Visual Captioning in 3D Environments
Figure 2 for Explore and Tell: Embodied Visual Captioning in 3D Environments
Figure 3 for Explore and Tell: Embodied Visual Captioning in 3D Environments
Figure 4 for Explore and Tell: Embodied Visual Captioning in 3D Environments
Viaarxiv icon

KiUT: Knowledge-injected U-Transformer for Radiology Report Generation

Add code
Jun 20, 2023
Figure 1 for KiUT: Knowledge-injected U-Transformer for Radiology Report Generation
Figure 2 for KiUT: Knowledge-injected U-Transformer for Radiology Report Generation
Figure 3 for KiUT: Knowledge-injected U-Transformer for Radiology Report Generation
Figure 4 for KiUT: Knowledge-injected U-Transformer for Radiology Report Generation
Viaarxiv icon

Reading Radiology Imaging Like The Radiologist

Add code
Jul 20, 2023
Viaarxiv icon

COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

Add code
Jun 15, 2023
Viaarxiv icon

Bypass Network for Semantics Driven Image Paragraph Captioning

Add code
Jun 21, 2022
Figure 1 for Bypass Network for Semantics Driven Image Paragraph Captioning
Figure 2 for Bypass Network for Semantics Driven Image Paragraph Captioning
Figure 3 for Bypass Network for Semantics Driven Image Paragraph Captioning
Figure 4 for Bypass Network for Semantics Driven Image Paragraph Captioning
Viaarxiv icon

Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning

Add code
Jun 03, 2022
Figure 1 for Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
Figure 2 for Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
Figure 3 for Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
Figure 4 for Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
Viaarxiv icon