Alert button
Picture for Naoya Takahashi

Naoya Takahashi

Alert button

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events

Jun 15, 2023
Kazuki Shimada, Archontis Politis, Parthasaarathy Sudarsanam, Daniel Krause, Kengo Uchida, Sharath Adavanne, Aapo Hakala, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Tuomas Virtanen, Yuki Mitsufuji

Figure 1 for STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Figure 2 for STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Figure 3 for STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Figure 4 for STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
Viaarxiv icon

Iteratively Improving Speech Recognition and Voice Conversion

May 24, 2023
Mayank Kumar Singh, Naoya Takahashi, Onoe Naoyuki

Figure 1 for Iteratively Improving Speech Recognition and Voice Conversion
Figure 2 for Iteratively Improving Speech Recognition and Voice Conversion
Figure 3 for Iteratively Improving Speech Recognition and Voice Conversion
Figure 4 for Iteratively Improving Speech Recognition and Voice Conversion
Viaarxiv icon

The Whole Is Greater than the Sum of Its Parts: Improving DNN-based Music Source Separation

May 13, 2023
Ryosuke Sawata, Naoya Takahashi, Stefan Uhlich, Shusuke Takahashi, Yuki Mitsufuji

Figure 1 for The Whole Is Greater than the Sum of Its Parts: Improving DNN-based Music Source Separation
Figure 2 for The Whole Is Greater than the Sum of Its Parts: Improving DNN-based Music Source Separation
Figure 3 for The Whole Is Greater than the Sum of Its Parts: Improving DNN-based Music Source Separation
Figure 4 for The Whole Is Greater than the Sum of Its Parts: Improving DNN-based Music Source Separation
Viaarxiv icon

Cross-modal Face- and Voice-style Transfer

Mar 01, 2023
Naoya Takahashi, Mayank K. Singh, Yuki Mitsufuji

Figure 1 for Cross-modal Face- and Voice-style Transfer
Figure 2 for Cross-modal Face- and Voice-style Transfer
Figure 3 for Cross-modal Face- and Voice-style Transfer
Figure 4 for Cross-modal Face- and Voice-style Transfer
Viaarxiv icon

Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing

Feb 21, 2023
Nirmesh Shah, Mayank Kumar Singh, Naoya Takahashi, Naoyuki Onoe

Figure 1 for Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing
Figure 2 for Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing
Viaarxiv icon

CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos

Dec 14, 2022
Hao-Wen Dong, Naoya Takahashi, Yuki Mitsufuji, Julian McAuley, Taylor Berg-Kirkpatrick

Figure 1 for CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
Figure 2 for CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
Figure 3 for CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
Figure 4 for CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
Viaarxiv icon

Hierarchical Diffusion Models for Singing Voice Neural Vocoder

Oct 18, 2022
Naoya Takahashi, Mayank Kumar, Singh, Yuki Mitsufuji

Figure 1 for Hierarchical Diffusion Models for Singing Voice Neural Vocoder
Figure 2 for Hierarchical Diffusion Models for Singing Voice Neural Vocoder
Figure 3 for Hierarchical Diffusion Models for Singing Voice Neural Vocoder
Figure 4 for Hierarchical Diffusion Models for Singing Voice Neural Vocoder
Viaarxiv icon

DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability

Oct 11, 2022
Kin Wai Cheuk, Ryosuke Sawata, Toshimitsu Uesaka, Naoki Murata, Naoya Takahashi, Shusuke Takahashi, Dorien Herremans, Yuki Mitsufuji

Figure 1 for DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability
Figure 2 for DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability
Figure 3 for DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability
Figure 4 for DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability
Viaarxiv icon

Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer

Aug 26, 2022
Shrutina Agarwal, Sriram Ganapathy, Naoya Takahashi

Figure 1 for Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer
Figure 2 for Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer
Figure 3 for Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer
Figure 4 for Leveraging Symmetrical Convolutional Transformer Networks for Speech to Singing Voice Style Transfer
Viaarxiv icon

STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events

Jun 04, 2022
Archontis Politis, Kazuki Shimada, Parthasaarathy Sudarsanam, Sharath Adavanne, Daniel Krause, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji, Tuomas Virtanen

Figure 1 for STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events
Figure 2 for STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events
Figure 3 for STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events
Figure 4 for STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events
Viaarxiv icon