Image Captioning


Image captioning is the process of generating a textual description of an image. It uses both Natural Language Processing (NLP) and Computer Vision (CV) to generate the captions.

HalLoc: Token-level Localization of Hallucinations for Vision Language Models

Add code
Jun 12, 2025
Viaarxiv icon

A Manually Annotated Image-Caption Dataset for Detecting Children in the Wild

Add code
Jun 11, 2025
Viaarxiv icon

A Novel Lightweight Transformer with Edge-Aware Fusion for Remote Sensing Image Captioning

Add code
Jun 11, 2025
Viaarxiv icon

Text to Image for Multi-Label Image Recognition with Joint Prompt-Adapter Learning

Add code
Jun 12, 2025
Viaarxiv icon

ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs

Add code
Jun 11, 2025
Viaarxiv icon

Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning

Add code
Jun 11, 2025
Viaarxiv icon

Adding simple structure at inference improves Vision-Language Compositionality

Add code
Jun 11, 2025
Viaarxiv icon

Improving Personalized Search with Regularized Low-Rank Parameter Updates

Add code
Jun 11, 2025
Viaarxiv icon

Edit Flows: Flow Matching with Edit Operations

Add code
Jun 10, 2025
Viaarxiv icon

Dense Retrievers Can Fail on Simple Queries: Revealing The Granularity Dilemma of Embeddings

Add code
Jun 10, 2025
Viaarxiv icon