Audio Visual Video Captioning


FoleyDirector: Fine-Grained Temporal Steering for Video-to-Audio Generation via Structured Scripts

Add code
Mar 20, 2026
Viaarxiv icon

Exposing Cross-Modal Consistency for Fake News Detection in Short-Form Videos

Add code
Mar 16, 2026
Viaarxiv icon

TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions

Add code
Feb 09, 2026
Viaarxiv icon

D-ORCA: Dialogue-Centric Optimization for Robust Audio-Visual Captioning

Add code
Feb 08, 2026
Viaarxiv icon

ALIVE: Animate Your World with Lifelike Audio-Video Generation

Add code
Feb 09, 2026
Viaarxiv icon

Exploring Physical Intelligence Emergence via Omni-Modal Architecture and Physical Data Engine

Add code
Feb 05, 2026
Viaarxiv icon

DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation

Add code
Feb 12, 2026
Viaarxiv icon

Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions

Add code
Feb 13, 2026
Viaarxiv icon

Klear: Unified Multi-Task Audio-Video Joint Generation

Add code
Jan 07, 2026
Viaarxiv icon

OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding

Add code
Dec 29, 2025
Viaarxiv icon