Picture for Yuki Mitsufuji

Yuki Mitsufuji

Attribution-by-design: Ensuring Inference-Time Provenance in Generative Music Systems

Add code
Oct 09, 2025
Viaarxiv icon

TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation

Add code
Oct 08, 2025
Viaarxiv icon

SONA: Learning Conditional, Unconditional, and Mismatching-Aware Discriminator

Add code
Oct 06, 2025
Viaarxiv icon

Demystifying MaskGIT Sampler and Beyond: Adaptive Order Selection in Masked Diffusion

Add code
Oct 06, 2025
Viaarxiv icon

SoundReactor: Frame-level Online Video-to-Audio Generation

Add code
Oct 02, 2025
Figure 1 for SoundReactor: Frame-level Online Video-to-Audio Generation
Figure 2 for SoundReactor: Frame-level Online Video-to-Audio Generation
Figure 3 for SoundReactor: Frame-level Online Video-to-Audio Generation
Figure 4 for SoundReactor: Frame-level Online Video-to-Audio Generation
Viaarxiv icon

VIRTUE: Visual-Interactive Text-Image Universal Embedder

Add code
Oct 01, 2025
Viaarxiv icon

Reverse Engineering of Music Mixing Graphs with Differentiable Processors and Iterative Pruning

Add code
Sep 19, 2025
Figure 1 for Reverse Engineering of Music Mixing Graphs with Differentiable Processors and Iterative Pruning
Figure 2 for Reverse Engineering of Music Mixing Graphs with Differentiable Processors and Iterative Pruning
Figure 3 for Reverse Engineering of Music Mixing Graphs with Differentiable Processors and Iterative Pruning
Figure 4 for Reverse Engineering of Music Mixing Graphs with Differentiable Processors and Iterative Pruning
Viaarxiv icon

TITAN-Guide: Taming Inference-Time AligNment for Guided Text-to-Video Diffusion Models

Add code
Aug 01, 2025
Viaarxiv icon

Music Arena: Live Evaluation for Text-to-Music

Add code
Jul 28, 2025
Viaarxiv icon

Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement

Add code
Jul 16, 2025
Viaarxiv icon