Alert button
Picture for Yuki Mitsufuji

Yuki Mitsufuji

Alert button

Zero- and Few-shot Sound Event Localization and Detection

Add code
Bookmark button
Alert button
Sep 17, 2023
Kazuki Shimada, Kengo Uchida, Yuichiro Koyama, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji, Tatsuya Kawahara

Figure 1 for Zero- and Few-shot Sound Event Localization and Detection
Figure 2 for Zero- and Few-shot Sound Event Localization and Detection
Figure 3 for Zero- and Few-shot Sound Event Localization and Detection
Figure 4 for Zero- and Few-shot Sound Event Localization and Detection
Viaarxiv icon

VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance

Add code
Bookmark button
Alert button
Sep 13, 2023
Carlos Hernandez-Olivan, Koichi Saito, Naoki Murata, Chieh-Hsin Lai, Marco A. Martínez-Ramirez, Wei-Hsiang Liao, Yuki Mitsufuji

Figure 1 for VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance
Figure 2 for VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance
Figure 3 for VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance
Figure 4 for VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance
Viaarxiv icon

BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network

Add code
Bookmark button
Alert button
Sep 06, 2023
Takashi Shibuya, Yuhta Takida, Yuki Mitsufuji

Figure 1 for BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
Figure 2 for BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
Figure 3 for BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
Figure 4 for BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network
Viaarxiv icon

Enhancing Semantic Communication with Deep Generative Models -- An ICASSP Special Session Overview

Add code
Bookmark button
Alert button
Sep 05, 2023
Eleonora Grassucci, Yuki Mitsufuji, Ping Zhang, Danilo Comminiello

Viaarxiv icon

The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track

Add code
Bookmark button
Alert button
Aug 14, 2023
Stefan Uhlich, Giorgio Fabbro, Masato Hirano, Shusuke Takahashi, Gordon Wichern, Jonathan Le Roux, Dipam Chakraborty, Sharada Mohanty, Kai Li, Yi Luo, Jianwei Yu, Rongzhi Gu, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Mikhail Sukhovei, Yuki Mitsufuji

Figure 1 for The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track
Figure 2 for The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track
Figure 3 for The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track
Figure 4 for The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track
Viaarxiv icon

The Sound Demixing Challenge 2023 $\unicode{x2013}$ Music Demixing Track

Add code
Bookmark button
Alert button
Aug 14, 2023
Giorgio Fabbro, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Martínez-Ramírez, Weihsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stöter, Alexandre Défossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Nabarun Goswami, Tatsuya Harada, Minseok Kim, Jun Hyung Lee, Yuanliang Dong, Xinran Zhang, Jiafeng Liu, Yuki Mitsufuji

Figure 1 for The Sound Demixing Challenge 2023 $\unicode{x2013}$ Music Demixing Track
Figure 2 for The Sound Demixing Challenge 2023 $\unicode{x2013}$ Music Demixing Track
Figure 3 for The Sound Demixing Challenge 2023 $\unicode{x2013}$ Music Demixing Track
Figure 4 for The Sound Demixing Challenge 2023 $\unicode{x2013}$ Music Demixing Track
Viaarxiv icon

Automatic Piano Transcription with Hierarchical Frequency-Time Transformer

Add code
Bookmark button
Alert button
Jul 10, 2023
Keisuke Toyama, Taketo Akama, Yukara Ikemiya, Yuhta Takida, Wei-Hsiang Liao, Yuki Mitsufuji

Figure 1 for Automatic Piano Transcription with Hierarchical Frequency-Time Transformer
Figure 2 for Automatic Piano Transcription with Hierarchical Frequency-Time Transformer
Figure 3 for Automatic Piano Transcription with Hierarchical Frequency-Time Transformer
Figure 4 for Automatic Piano Transcription with Hierarchical Frequency-Time Transformer
Viaarxiv icon

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events

Add code
Bookmark button
Alert button
Jun 15, 2023
Kazuki Shimada, Archontis Politis, Parthasaarathy Sudarsanam, Daniel Krause, Kengo Uchida, Sharath Adavanne, Aapo Hakala, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Tuomas Virtanen, Yuki Mitsufuji

Figure 1 for STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Figure 2 for STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Figure 3 for STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Figure 4 for STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Viaarxiv icon

On the Equivalence of Consistency-Type Models: Consistency Models, Consistent Diffusion Models, and Fokker-Planck Regularization

Add code
Bookmark button
Alert button
Jun 01, 2023
Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Naoki Murata, Yuki Mitsufuji, Stefano Ermon

Figure 1 for On the Equivalence of Consistency-Type Models: Consistency Models, Consistent Diffusion Models, and Fokker-Planck Regularization
Figure 2 for On the Equivalence of Consistency-Type Models: Consistency Models, Consistent Diffusion Models, and Fokker-Planck Regularization
Viaarxiv icon

Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders

Add code
Bookmark button
Alert button
May 18, 2023
Hao Shi, Kazuki Shimada, Masato Hirano, Takashi Shibuya, Yuichiro Koyama, Zhi Zhong, Shusuke Takahashi, Tatsuya Kawahara, Yuki Mitsufuji

Figure 1 for Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders
Figure 2 for Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders
Figure 3 for Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders
Figure 4 for Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders
Viaarxiv icon