Image Captioning


Image captioning is the process of generating a textual description of an image. It uses both Natural Language Processing (NLP) and Computer Vision (CV) to generate the captions.

Contextualized Visual Personalization in Vision-Language Models

Add code
Feb 03, 2026
Viaarxiv icon

Aligning Forest and Trees in Images and Long Captions for Visually Grounded Understanding

Add code
Feb 03, 2026
Viaarxiv icon

Generative Engine Optimization: A VLM and Agent Framework for Pinterest Acquisition Growth

Add code
Feb 03, 2026
Viaarxiv icon

Bongards at the Boundary of Perception and Reasoning: Programs or Language?

Add code
Feb 03, 2026
Viaarxiv icon

Tiled Prompts: Overcoming Prompt Underspecification in Image and Video Super-Resolution

Add code
Feb 03, 2026
Viaarxiv icon

DoubleTake: Contrastive Reasoning for Faithful Decision-Making in Medical Imaging

Add code
Feb 02, 2026
Viaarxiv icon

Auto-Comp: An Automated Pipeline for Scalable Compositional Probing of Contrastive Vision-Language Models

Add code
Feb 02, 2026
Viaarxiv icon

Modeling Image-Caption Rating from Comparative Judgments

Add code
Jan 30, 2026
Viaarxiv icon

Brazilian Portuguese Image Captioning with Transformers: A Study on Cross-Native-Translated Dataset

Add code
Jan 30, 2026
Viaarxiv icon

Countering the Over-Reliance Trap: Mitigating Object Hallucination for LVLMs via a Self-Validation Framework

Add code
Jan 30, 2026
Viaarxiv icon