Alert button

"speech": models, code, and papers
Alert button

Leveraging Speech Separation for Conversational Telephone Speaker Diarization

Apr 05, 2022
Giovanni Morrone, Samuele Cornell, Desh Raj, Enrico Zovato, Alessio Brutti, Stefano Squartini

Figure 1 for Leveraging Speech Separation for Conversational Telephone Speaker Diarization
Figure 2 for Leveraging Speech Separation for Conversational Telephone Speaker Diarization
Figure 3 for Leveraging Speech Separation for Conversational Telephone Speaker Diarization
Figure 4 for Leveraging Speech Separation for Conversational Telephone Speaker Diarization
Viaarxiv icon

Machine Learning for Synthetic Data Generation: a Review

Feb 08, 2023
Yingzhou Lu, Huazheng Wang, Wenqi Wei

Figure 1 for Machine Learning for Synthetic Data Generation: a Review
Figure 2 for Machine Learning for Synthetic Data Generation: a Review
Viaarxiv icon

A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition

Apr 05, 2022
Ye-Qian Du, Jie Zhang, Qiu-Shi Zhu, Li-Rong Dai, Ming-Hui Wu, Xin Fang, Zhou-Wang Yang

Figure 1 for A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition
Figure 2 for A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition
Figure 3 for A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition
Figure 4 for A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition
Viaarxiv icon

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis

Apr 21, 2022
Rongjie Huang, Max W. Y. Lam, Jun Wang, Dan Su, Dong Yu, Yi Ren, Zhou Zhao

Figure 1 for FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis
Figure 2 for FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis
Figure 3 for FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis
Figure 4 for FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis
Viaarxiv icon

Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition

Oct 01, 2022
Jash Rathod, Nauman Dawalatabad, Shatrughan Singh, Dhananjaya Gowda

Figure 1 for Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition
Figure 2 for Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition
Figure 3 for Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition
Figure 4 for Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition
Viaarxiv icon

Internal Language Model Estimation based Language Model Fusion for Cross-Domain Code-Switching Speech Recognition

Jul 09, 2022
Yizhou Peng, Yufei Liu, Jicheng Zhang, Haihua Xu, Yi He, Hao Huang, Eng Siong Chng

Figure 1 for Internal Language Model Estimation based Language Model Fusion for Cross-Domain Code-Switching Speech Recognition
Figure 2 for Internal Language Model Estimation based Language Model Fusion for Cross-Domain Code-Switching Speech Recognition
Figure 3 for Internal Language Model Estimation based Language Model Fusion for Cross-Domain Code-Switching Speech Recognition
Figure 4 for Internal Language Model Estimation based Language Model Fusion for Cross-Domain Code-Switching Speech Recognition
Viaarxiv icon

CMGAN: Conformer-based Metric GAN for Speech Enhancement

Mar 28, 2022
Ruizhe Cao, Sherif Abdulatif, Bin Yang

Figure 1 for CMGAN: Conformer-based Metric GAN for Speech Enhancement
Figure 2 for CMGAN: Conformer-based Metric GAN for Speech Enhancement
Figure 3 for CMGAN: Conformer-based Metric GAN for Speech Enhancement
Viaarxiv icon

Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion

Jul 05, 2022
Yi Lei, Shan Yang, Jian Cong, Lei Xie, Dan Su

Figure 1 for Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion
Figure 2 for Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion
Figure 3 for Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion
Figure 4 for Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion
Viaarxiv icon

Analyzing Robustness of End-to-End Neural Models for Automatic Speech Recognition

Aug 17, 2022
Goutham Rajendran, Wei Zou

Figure 1 for Analyzing Robustness of End-to-End Neural Models for Automatic Speech Recognition
Figure 2 for Analyzing Robustness of End-to-End Neural Models for Automatic Speech Recognition
Figure 3 for Analyzing Robustness of End-to-End Neural Models for Automatic Speech Recognition
Figure 4 for Analyzing Robustness of End-to-End Neural Models for Automatic Speech Recognition
Viaarxiv icon

ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models

Feb 09, 2023
Pengfei Zhu, Chao Pang, Shuohuan Wang, Yekun Chai, Yu Sun, Hao Tian, Hua Wu

Figure 1 for ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models
Figure 2 for ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models
Figure 3 for ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models
Figure 4 for ERNIE-Music: Text-to-Waveform Music Generation with Diffusion Models
Viaarxiv icon