Picture for Shinji Watanabe

Shinji Watanabe

Carnegie Mellon University

Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels

Add code
Sep 16, 2024
Figure 1 for Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels
Figure 2 for Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels
Figure 3 for Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels
Figure 4 for Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels
Viaarxiv icon

ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration

Add code
Sep 14, 2024
Viaarxiv icon

Text-To-Speech Synthesis In The Wild

Add code
Sep 13, 2024
Viaarxiv icon

Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm

Add code
Sep 11, 2024
Figure 1 for Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm
Figure 2 for Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm
Viaarxiv icon

Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition

Add code
Aug 17, 2024
Figure 1 for Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
Figure 2 for Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
Figure 3 for Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
Viaarxiv icon

CMU's IWSLT 2024 Simultaneous Speech Translation System

Add code
Aug 14, 2024
Viaarxiv icon

SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data

Add code
Aug 01, 2024
Viaarxiv icon

The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization

Add code
Jul 23, 2024
Figure 1 for The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
Figure 2 for The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
Figure 3 for The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
Figure 4 for The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
Viaarxiv icon

Multi-Convformer: Extending Conformer with Multiple Convolution Kernels

Add code
Jul 04, 2024
Figure 1 for Multi-Convformer: Extending Conformer with Multiple Convolution Kernels
Figure 2 for Multi-Convformer: Extending Conformer with Multiple Convolution Kernels
Figure 3 for Multi-Convformer: Extending Conformer with Multiple Convolution Kernels
Figure 4 for Multi-Convformer: Extending Conformer with Multiple Convolution Kernels
Viaarxiv icon

Towards Robust Speech Representation Learning for Thousands of Languages

Add code
Jul 02, 2024
Viaarxiv icon