Image Captioning


Image captioning is the process of generating a textual description of an image. It uses both Natural Language Processing (NLP) and Computer Vision (CV) to generate the captions.

CLIP-Map: Structured Matrix Mapping for Parameter-Efficient CLIP Compression

Add code
Feb 05, 2026
Viaarxiv icon

Contextualized Visual Personalization in Vision-Language Models

Add code
Feb 03, 2026
Viaarxiv icon

Aligning Forest and Trees in Images and Long Captions for Visually Grounded Understanding

Add code
Feb 03, 2026
Viaarxiv icon

PromptSplit: Revealing Prompt-Level Disagreement in Generative Models

Add code
Feb 03, 2026
Viaarxiv icon

Generative Engine Optimization: A VLM and Agent Framework for Pinterest Acquisition Growth

Add code
Feb 03, 2026
Viaarxiv icon

Bongards at the Boundary of Perception and Reasoning: Programs or Language?

Add code
Feb 03, 2026
Viaarxiv icon

DoubleTake: Contrastive Reasoning for Faithful Decision-Making in Medical Imaging

Add code
Feb 02, 2026
Viaarxiv icon

Tiled Prompts: Overcoming Prompt Underspecification in Image and Video Super-Resolution

Add code
Feb 03, 2026
Viaarxiv icon

Auto-Comp: An Automated Pipeline for Scalable Compositional Probing of Contrastive Vision-Language Models

Add code
Feb 02, 2026
Viaarxiv icon

Modeling Image-Caption Rating from Comparative Judgments

Add code
Jan 30, 2026
Viaarxiv icon