Picture for Koichi Saito

Koichi Saito

TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation

Add code
Oct 08, 2025
Viaarxiv icon

SoundReactor: Frame-level Online Video-to-Audio Generation

Add code
Oct 02, 2025
Figure 1 for SoundReactor: Frame-level Online Video-to-Audio Generation
Figure 2 for SoundReactor: Frame-level Online Video-to-Audio Generation
Figure 3 for SoundReactor: Frame-level Online Video-to-Audio Generation
Figure 4 for SoundReactor: Frame-level Online Video-to-Audio Generation
Viaarxiv icon

Music Arena: Live Evaluation for Text-to-Music

Add code
Jul 28, 2025
Viaarxiv icon

Schrödinger Bridge Consistency Trajectory Models for Speech Enhancement

Add code
Jul 16, 2025
Viaarxiv icon

Dyadic Mamba: Long-term Dyadic Human Motion Synthesis

Add code
May 14, 2025
Viaarxiv icon

Aligning Text-to-Music Evaluation with Human Preferences

Add code
Mar 20, 2025
Viaarxiv icon

VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music

Add code
Dec 23, 2024
Viaarxiv icon

DisMix: Disentangling Mixtures of Musical Instruments for Source-level Pitch and Timbre Manipulation

Add code
Aug 20, 2024
Figure 1 for DisMix: Disentangling Mixtures of Musical Instruments for Source-level Pitch and Timbre Manipulation
Figure 2 for DisMix: Disentangling Mixtures of Musical Instruments for Source-level Pitch and Timbre Manipulation
Figure 3 for DisMix: Disentangling Mixtures of Musical Instruments for Source-level Pitch and Timbre Manipulation
Figure 4 for DisMix: Disentangling Mixtures of Musical Instruments for Source-level Pitch and Timbre Manipulation
Viaarxiv icon

SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond

Add code
Jun 26, 2024
Figure 1 for SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond
Figure 2 for SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond
Figure 3 for SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond
Figure 4 for SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond
Viaarxiv icon

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation

Add code
May 28, 2024
Figure 1 for SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Figure 2 for SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Figure 3 for SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Figure 4 for SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Viaarxiv icon