Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance

Jun 26, 2025

Akio Hayakawa, Masato Ishii, Takashi Shibuya, Yuki Mitsufuji

Share this with someone who'll enjoy it:

Abstract:We propose a novel step-by-step video-to-audio generation method that sequentially produces individual audio tracks, each corresponding to a specific sound event in the video. Our approach mirrors traditional Foley workflows, aiming to capture all sound events induced by a given video comprehensively. Each generation step is formulated as a guided video-to-audio synthesis task, conditioned on a target text prompt and previously generated audio tracks. This design is inspired by the idea of concept negation from prior compositional generation frameworks. To enable this guided generation, we introduce a training framework that leverages pre-trained video-to-audio models and eliminates the need for specialized paired datasets, allowing training on more accessible data. Experimental results demonstrate that our method generates multiple semantically distinct audio tracks for a single input video, leading to higher-quality composite audio synthesis than existing baselines.

View paper on

Share this with someone who'll enjoy it:

Title:Step-by-Step Video-to-Audio Synthesis via Negative Audio Guidance

Paper and Code