Video Narration Captioning


MovieTeller: Tool-augmented Movie Synopsis with ID Consistent Progressive Abstraction

Add code
Feb 26, 2026
Viaarxiv icon

TraceVision: Trajectory-Aware Vision-Language Model for Human-Like Spatial Understanding

Add code
Feb 24, 2026
Viaarxiv icon

TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions

Add code
Feb 09, 2026
Viaarxiv icon

HiVid-Narrator: Hierarchical Video Narrative Generation with Scene-Primed ASR-anchored Compression

Add code
Jan 12, 2026
Viaarxiv icon

Streaming Video Instruction Tuning

Add code
Dec 24, 2025
Figure 1 for Streaming Video Instruction Tuning
Figure 2 for Streaming Video Instruction Tuning
Figure 3 for Streaming Video Instruction Tuning
Figure 4 for Streaming Video Instruction Tuning
Viaarxiv icon

OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

Add code
Dec 08, 2025
Figure 1 for OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory
Figure 2 for OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory
Figure 3 for OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory
Figure 4 for OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory
Viaarxiv icon

NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models

Add code
Nov 09, 2025
Figure 1 for NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models
Figure 2 for NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models
Figure 3 for NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models
Figure 4 for NOAH: Benchmarking Narrative Prior driven Hallucination and Omission in Video Large Language Models
Viaarxiv icon

Cinéaste: A Fine-grained Contextual Movie Question Answering Benchmark

Add code
Sep 17, 2025
Viaarxiv icon

Toward Scalable Video Narration: A Training-free Approach Using Multimodal Large Language Models

Add code
Jul 22, 2025
Viaarxiv icon

Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders

Add code
May 30, 2025
Viaarxiv icon