Image Captioning


Image captioning is the process of generating a textual description of an image. It uses both Natural Language Processing (NLP) and Computer Vision (CV) to generate the captions.

BBQ-to-Image: Numeric Bounding Box and Qolor Control in Large-Scale Text-to-Image Models

Add code
Feb 24, 2026
Viaarxiv icon

GMAIL: Generative Modality Alignment for generated Image Learning

Add code
Feb 17, 2026
Viaarxiv icon

Using Deep Learning to Generate Semantically Correct Hindi Captions

Add code
Feb 13, 2026
Viaarxiv icon

Web-Scale Multimodal Summarization using CLIP-Based Semantic Alignment

Add code
Feb 16, 2026
Viaarxiv icon

EarthSpatialBench: Benchmarking Spatial Reasoning Capabilities of Multimodal LLMs on Earth Imagery

Add code
Feb 17, 2026
Viaarxiv icon

Is Information Density Uniform when Utterances are Grounded on Perception and Discourse?

Add code
Feb 16, 2026
Viaarxiv icon

HeatPrompt: Zero-Shot Vision-Language Modeling of Urban Heat Demand from Satellite Images

Add code
Feb 23, 2026
Viaarxiv icon

Half-Truths Break Similarity-Based Retrieval

Add code
Feb 27, 2026
Viaarxiv icon

OmniScience: A Large-scale Multi-modal Dataset for Scientific Image Understanding

Add code
Feb 14, 2026
Viaarxiv icon

D-SECURE: Dual-Source Evidence Combination for Unified Reasoning in Misinformation Detection

Add code
Feb 16, 2026
Viaarxiv icon