Image Captioning


Image captioning is the process of generating a textual description of an image. It uses both Natural Language Processing (NLP) and Computer Vision (CV) to generate the captions.

Generative Score Inference for Multimodal Data

Add code
Mar 27, 2026
Viaarxiv icon

Label-Free Cross-Task LoRA Merging with Null-Space Compression

Add code
Mar 27, 2026
Viaarxiv icon

The Limits of Learning from Pictures and Text: Vision-Language Models and Embodied Scene Understanding

Add code
Mar 27, 2026
Viaarxiv icon

Unlocking Few-Shot Capabilities in LVLMs via Prompt Conditioning and Head Selection

Add code
Mar 25, 2026
Viaarxiv icon

LLaVA-LE: Large Language-and-Vision Assistant for Lunar Exploration

Add code
Mar 25, 2026
Viaarxiv icon

No Hard Negatives Required: Concept Centric Learning Leads to Compositionality without Degrading Zero-shot Capabilities of Contrastive Models

Add code
Mar 26, 2026
Viaarxiv icon

Group Editing : Edit Multiple Images in One Go

Add code
Mar 25, 2026
Viaarxiv icon

Caption Generation for Dongba Paintings via Prompt Learning and Semantic Fusion

Add code
Mar 24, 2026
Viaarxiv icon

Interpretable Zero-shot Referring Expression Comprehension with Query-driven Scene Graphs

Add code
Mar 26, 2026
Viaarxiv icon

The Dual Mechanisms of Spatial Reasoning in Vision-Language Models

Add code
Mar 23, 2026
Viaarxiv icon