Dense Video Captioning


Dense video captioning is the process of generating textual descriptions for multiple events in a video.

Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Add code
Oct 22, 2025
Viaarxiv icon

VideoLucy: Deep Memory Backtracking for Long Video Understanding

Add code
Oct 14, 2025
Figure 1 for VideoLucy: Deep Memory Backtracking for Long Video Understanding
Figure 2 for VideoLucy: Deep Memory Backtracking for Long Video Understanding
Figure 3 for VideoLucy: Deep Memory Backtracking for Long Video Understanding
Figure 4 for VideoLucy: Deep Memory Backtracking for Long Video Understanding
Viaarxiv icon

Sali4Vid: Saliency-Aware Video Reweighting and Adaptive Caption Retrieval for Dense Video Captioning

Add code
Sep 04, 2025
Figure 1 for Sali4Vid: Saliency-Aware Video Reweighting and Adaptive Caption Retrieval for Dense Video Captioning
Figure 2 for Sali4Vid: Saliency-Aware Video Reweighting and Adaptive Caption Retrieval for Dense Video Captioning
Figure 3 for Sali4Vid: Saliency-Aware Video Reweighting and Adaptive Caption Retrieval for Dense Video Captioning
Figure 4 for Sali4Vid: Saliency-Aware Video Reweighting and Adaptive Caption Retrieval for Dense Video Captioning
Viaarxiv icon

Time-Scaling State-Space Models for Dense Video Captioning

Add code
Sep 03, 2025
Viaarxiv icon

SAIL-VL2 Technical Report

Add code
Sep 18, 2025
Viaarxiv icon

SpatialVID: A Large-Scale Video Dataset with Spatial Annotations

Add code
Sep 11, 2025
Viaarxiv icon

Toward Scalable Video Narration: A Training-free Approach Using Multimodal Large Language Models

Add code
Jul 22, 2025
Viaarxiv icon

VeS: Teaching Pixels to Listen Without Supervision

Add code
Jul 29, 2025
Viaarxiv icon

Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization

Add code
Jun 25, 2025
Viaarxiv icon

Dense Video Captioning using Graph-based Sentence Summarization

Add code
Jun 25, 2025
Viaarxiv icon