Alert button
Picture for Naoya Takahashi

Naoya Takahashi

Alert button

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events

Add code
Bookmark button
Alert button
Jun 15, 2023
Kazuki Shimada, Archontis Politis, Parthasaarathy Sudarsanam, Daniel Krause, Kengo Uchida, Sharath Adavanne, Aapo Hakala, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Tuomas Virtanen, Yuki Mitsufuji

Figure 1 for STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Figure 2 for STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Figure 3 for STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Figure 4 for STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Viaarxiv icon

Iteratively Improving Speech Recognition and Voice Conversion

Add code
Bookmark button
Alert button
May 24, 2023
Mayank Kumar Singh, Naoya Takahashi, Onoe Naoyuki

Figure 1 for Iteratively Improving Speech Recognition and Voice Conversion
Figure 2 for Iteratively Improving Speech Recognition and Voice Conversion
Figure 3 for Iteratively Improving Speech Recognition and Voice Conversion
Figure 4 for Iteratively Improving Speech Recognition and Voice Conversion
Viaarxiv icon

The Whole Is Greater than the Sum of Its Parts: Improving DNN-based Music Source Separation

Add code
Bookmark button
Alert button
May 13, 2023
Ryosuke Sawata, Naoya Takahashi, Stefan Uhlich, Shusuke Takahashi, Yuki Mitsufuji

Figure 1 for The Whole Is Greater than the Sum of Its Parts: Improving DNN-based Music Source Separation
Figure 2 for The Whole Is Greater than the Sum of Its Parts: Improving DNN-based Music Source Separation
Figure 3 for The Whole Is Greater than the Sum of Its Parts: Improving DNN-based Music Source Separation
Figure 4 for The Whole Is Greater than the Sum of Its Parts: Improving DNN-based Music Source Separation
Viaarxiv icon

Cross-modal Face- and Voice-style Transfer

Add code
Bookmark button
Alert button
Mar 01, 2023
Naoya Takahashi, Mayank K. Singh, Yuki Mitsufuji

Figure 1 for Cross-modal Face- and Voice-style Transfer
Figure 2 for Cross-modal Face- and Voice-style Transfer
Figure 3 for Cross-modal Face- and Voice-style Transfer
Figure 4 for Cross-modal Face- and Voice-style Transfer
Viaarxiv icon

Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing

Add code
Bookmark button
Alert button
Feb 21, 2023
Nirmesh Shah, Mayank Kumar Singh, Naoya Takahashi, Naoyuki Onoe

Figure 1 for Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing
Figure 2 for Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing
Viaarxiv icon

CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos

Add code
Bookmark button
Alert button
Dec 14, 2022
Hao-Wen Dong, Naoya Takahashi, Yuki Mitsufuji, Julian McAuley, Taylor Berg-Kirkpatrick

Figure 1 for CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
Figure 2 for CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
Figure 3 for CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
Figure 4 for CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
Viaarxiv icon

Hierarchical Diffusion Models for Singing Voice Neural Vocoder

Add code
Bookmark button
Alert button
Oct 18, 2022
Naoya Takahashi, Mayank Kumar, Singh, Yuki Mitsufuji

Figure 1 for Hierarchical Diffusion Models for Singing Voice Neural Vocoder
Figure 2 for Hierarchical Diffusion Models for Singing Voice Neural Vocoder
Figure 3 for Hierarchical Diffusion Models for Singing Voice Neural Vocoder
Figure 4 for Hierarchical Diffusion Models for Singing Voice Neural Vocoder
Viaarxiv icon

DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability

Add code
Bookmark button
Alert button
Oct 11, 2022
Kin Wai Cheuk, Ryosuke Sawata, Toshimitsu Uesaka, Naoki Murata, Naoya Takahashi, Shusuke Takahashi, Dorien Herremans, Yuki Mitsufuji

Figure 1 for DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability
Figure 2 for DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability
Figure 3 for DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability
Figure 4 for DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability
Viaarxiv icon

Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

Add code
Bookmark button
Alert button
Aug 26, 2022
Shrutina Agarwal, Sriram Ganapathy, Naoya Takahashi

Figure 1 for Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer
Figure 2 for Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer
Figure 3 for Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer
Figure 4 for Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer
Viaarxiv icon

STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events

Add code
Bookmark button
Alert button
Jun 04, 2022
Archontis Politis, Kazuki Shimada, Parthasaarathy Sudarsanam, Sharath Adavanne, Daniel Krause, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji, Tuomas Virtanen

Figure 1 for STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events
Figure 2 for STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events
Figure 3 for STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events
Figure 4 for STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events
Viaarxiv icon