Dense Video Captioning


Dense video captioning is the process of generating textual descriptions for multiple events in a video.

ViSIL: Unified Evaluation of Information Loss in Multimodal Video Captioning

Add code
Jan 14, 2026
Viaarxiv icon

TA-Prompting: Enhancing Video Large Language Models for Dense Video Captioning via Temporal Anchors

Add code
Jan 06, 2026
Viaarxiv icon

See More, Store Less: Memory-Efficient Resolution for Video Moment Retrieval

Add code
Jan 14, 2026
Viaarxiv icon

Future Optical Flow Prediction Improves Robot Control & Video Generation

Add code
Jan 15, 2026
Viaarxiv icon

Klear: Unified Multi-Task Audio-Video Joint Generation

Add code
Jan 07, 2026
Viaarxiv icon

HiVid-Narrator: Hierarchical Video Narrative Generation with Scene-Primed ASR-anchored Compression

Add code
Jan 12, 2026
Viaarxiv icon

PrismVAU: Prompt-Refined Inference System for Multimodal Video Anomaly Understanding

Add code
Jan 07, 2026
Viaarxiv icon

OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding

Add code
Dec 29, 2025
Viaarxiv icon

EasyV2V: A High-quality Instruction-based Video Editing Framework

Add code
Dec 18, 2025
Figure 1 for EasyV2V: A High-quality Instruction-based Video Editing Framework
Figure 2 for EasyV2V: A High-quality Instruction-based Video Editing Framework
Figure 3 for EasyV2V: A High-quality Instruction-based Video Editing Framework
Figure 4 for EasyV2V: A High-quality Instruction-based Video Editing Framework
Viaarxiv icon

Explicit Temporal-Semantic Modeling for Dense Video Captioning via Context-Aware Cross-Modal Interaction

Add code
Nov 13, 2025
Viaarxiv icon