Video Captioning


CapRL++: Unified Reinforcement Learning with Verifiable Rewards for Dense Image and Video Captioning

Add code
Jun 08, 2026
Viaarxiv icon

Audio-Visual Exchange-Aware Token Pruning for Efficient Audio-Visual Captioning

Add code
Jun 09, 2026
Viaarxiv icon

Towards Accurate Emotion-Attributed Video Captioning via Fine-grained Emotion-Cause Pair Extraction

Add code
Jun 07, 2026
Viaarxiv icon

OmniCap-IF: Benchmarking and Improving Instruction Following Abilities for Omni-Video Captioning

Add code
Jun 07, 2026
Viaarxiv icon

CineDance: Towards Next-Generation Multi-Shot Long-Form Cinematic Audio-Video Generation

Add code
Jun 08, 2026
Viaarxiv icon

Temporal-Aware Reasoning Optimization for Video Temporal Grounding

Add code
Jun 08, 2026
Viaarxiv icon

ChronoPhyBench: Do MLLMs Truly Understand the World or Merely Exploit Language Priors?

Add code
Jun 06, 2026
Viaarxiv icon

Never Seen Before: Benchmarking Genuine Zero-Shot Composed Image Retrieval with Consistent Video-Sourced Datasets

Add code
Jun 05, 2026
Viaarxiv icon

Towards One-to-Many Temporal Grounding

Add code
Jun 04, 2026
Viaarxiv icon

SVHighlights: Towards Extremely Long Sport Video Highlight Detection

Add code
Jun 05, 2026
Viaarxiv icon