Video Synchronization


MVP-LAM: Learning Action-Centric Latent Action via Cross-Viewpoint Reconstruction

Add code
Feb 03, 2026
Viaarxiv icon

Omni-Judge: Can Omni-LLMs Serve as Human-Aligned Judges for Text-Conditioned Audio-Video Generation?

Add code
Feb 02, 2026
Viaarxiv icon

JoyAvatar: Unlocking Highly Expressive Avatars via Harmonized Text-Audio Conditioning

Add code
Jan 31, 2026
Viaarxiv icon

JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion

Add code
Jan 29, 2026
Viaarxiv icon

EditYourself: Audio-Driven Generation and Manipulation of Talking Head Videos with Diffusion Transformers

Add code
Jan 29, 2026
Viaarxiv icon

InspecSafe-V1: A Multimodal Benchmark for Safety Assessment in Industrial Inspection Scenarios

Add code
Jan 29, 2026
Viaarxiv icon

StreamFusion: Scalable Sequence Parallelism for Distributed Inference of Diffusion Transformers on GPUs

Add code
Jan 28, 2026
Viaarxiv icon

FreeOrbit4D: Training-Free Arbitrary Camera Redirection for Monocular Videos via Geometry-Complete 4D Reconstruction

Add code
Jan 26, 2026
Viaarxiv icon

SkyReels-V3 Technique Report

Add code
Jan 24, 2026
Viaarxiv icon

Incorporating Eye-Tracking Signals Into Multimodal Deep Visual Models For Predicting User Aesthetic Experience In Residential Interiors

Add code
Jan 23, 2026
Viaarxiv icon