Picture for Shinji Watanabe

Shinji Watanabe

Carnegie Mellon University

VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation

Add code
Jun 13, 2024
Figure 1 for VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation
Figure 2 for VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation
Figure 3 for VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation
Figure 4 for VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation
Viaarxiv icon

ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets

Add code
Jun 12, 2024
Figure 1 for ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
Figure 2 for ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
Figure 3 for ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
Figure 4 for ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets
Viaarxiv icon

Self-Supervised Speech Representations are More Phonetic than Semantic

Add code
Jun 12, 2024
Figure 1 for Self-Supervised Speech Representations are More Phonetic than Semantic
Figure 2 for Self-Supervised Speech Representations are More Phonetic than Semantic
Figure 3 for Self-Supervised Speech Representations are More Phonetic than Semantic
Figure 4 for Self-Supervised Speech Representations are More Phonetic than Semantic
Viaarxiv icon

Neural Blind Source Separation and Diarization for Distant Speech Recognition

Add code
Jun 12, 2024
Figure 1 for Neural Blind Source Separation and Diarization for Distant Speech Recognition
Figure 2 for Neural Blind Source Separation and Diarization for Distant Speech Recognition
Figure 3 for Neural Blind Source Separation and Diarization for Distant Speech Recognition
Figure 4 for Neural Blind Source Separation and Diarization for Distant Speech Recognition
Viaarxiv icon

EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation

Add code
Jun 11, 2024
Figure 1 for EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation
Figure 2 for EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation
Figure 3 for EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation
Figure 4 for EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation
Viaarxiv icon

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units

Add code
Jun 11, 2024
Figure 1 for The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Figure 2 for The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Figure 3 for The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Figure 4 for The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Viaarxiv icon

To what extent can ASV systems naturally defend against spoofing attacks?

Add code
Jun 08, 2024
Viaarxiv icon

URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement

Add code
Jun 07, 2024
Figure 1 for URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement
Figure 2 for URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement
Figure 3 for URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement
Figure 4 for URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement
Viaarxiv icon

Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement

Add code
Jun 06, 2024
Figure 1 for Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
Figure 2 for Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
Figure 3 for Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
Figure 4 for Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
Viaarxiv icon

4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders

Add code
Jun 05, 2024
Figure 1 for 4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders
Figure 2 for 4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders
Figure 3 for 4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders
Figure 4 for 4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders
Viaarxiv icon