Image Captioning


Image captioning is the process of generating a textual description of an image. It uses both Natural Language Processing (NLP) and Computer Vision (CV) to generate the captions.

Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective

Add code
Mar 24, 2025
Viaarxiv icon

RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models

Add code
Mar 27, 2025
Viaarxiv icon

OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement

Add code
Mar 21, 2025
Viaarxiv icon

Empowering Large Language Models with 3D Situation Awareness

Add code
Mar 29, 2025
Viaarxiv icon

Scaling Vision Pre-Training to 4K Resolution

Add code
Mar 25, 2025
Viaarxiv icon

OmniDiff: A Comprehensive Benchmark for Fine-grained Image Difference Captioning

Add code
Mar 14, 2025
Viaarxiv icon

Taxonomic Reasoning for Rare Arthropods: Combining Dense Image Captioning and RAG for Interpretable Classification

Add code
Mar 13, 2025
Viaarxiv icon

Don't Judge Before You CLIP: A Unified Approach for Perceptual Tasks

Add code
Mar 17, 2025
Viaarxiv icon

Aligning Vision to Language: Text-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning

Add code
Mar 17, 2025
Viaarxiv icon

SuperCap: Multi-resolution Superpixel-based Image Captioning

Add code
Mar 11, 2025
Viaarxiv icon