Image Captioning


Image captioning is the process of generating a textual description of an image. It uses both Natural Language Processing (NLP) and Computer Vision (CV) to generate the captions.

Auto-Vocabulary 3D Object Detection

Add code
Dec 18, 2025
Figure 1 for Auto-Vocabulary 3D Object Detection
Figure 2 for Auto-Vocabulary 3D Object Detection
Figure 3 for Auto-Vocabulary 3D Object Detection
Figure 4 for Auto-Vocabulary 3D Object Detection
Viaarxiv icon

Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning

Add code
Dec 09, 2025
Viaarxiv icon

UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture

Add code
Dec 25, 2025
Figure 1 for UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
Figure 2 for UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
Figure 3 for UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
Figure 4 for UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
Viaarxiv icon

An Efficient and Effective Encoder Model for Vision and Language Tasks in the Remote Sensing Domain

Add code
Dec 17, 2025
Viaarxiv icon

SLGNet: Synergizing Structural Priors and Language-Guided Modulation for Multimodal Object Detection

Add code
Jan 05, 2026
Viaarxiv icon

SLIM: Semantic-based Low-bitrate Image compression for Machines by leveraging diffusion

Add code
Dec 23, 2025
Figure 1 for SLIM: Semantic-based Low-bitrate Image compression for Machines by leveraging diffusion
Figure 2 for SLIM: Semantic-based Low-bitrate Image compression for Machines by leveraging diffusion
Figure 3 for SLIM: Semantic-based Low-bitrate Image compression for Machines by leveraging diffusion
Figure 4 for SLIM: Semantic-based Low-bitrate Image compression for Machines by leveraging diffusion
Viaarxiv icon

A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis

Add code
Dec 16, 2025
Figure 1 for A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis
Figure 2 for A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis
Figure 3 for A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis
Figure 4 for A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis
Viaarxiv icon

Towards Open-Vocabulary Industrial Defect Understanding with a Large-Scale Multimodal Dataset

Add code
Dec 30, 2025
Viaarxiv icon

Generative diffusion models for agricultural AI: plant image generation, indoor-to-outdoor translation, and expert preference alignment

Add code
Dec 22, 2025
Figure 1 for Generative diffusion models for agricultural AI: plant image generation, indoor-to-outdoor translation, and expert preference alignment
Figure 2 for Generative diffusion models for agricultural AI: plant image generation, indoor-to-outdoor translation, and expert preference alignment
Figure 3 for Generative diffusion models for agricultural AI: plant image generation, indoor-to-outdoor translation, and expert preference alignment
Figure 4 for Generative diffusion models for agricultural AI: plant image generation, indoor-to-outdoor translation, and expert preference alignment
Viaarxiv icon

UniHetero: Could Generation Enhance Understanding for Vision-Language-Model at Large Data Scale?

Add code
Dec 30, 2025
Viaarxiv icon