Alert button

"speech": models, code, and papers
Alert button

Adapting Speech Separation to Real-World Meetings Using Mixture Invariant Training

Add code
Bookmark button
Alert button
Oct 20, 2021
Aswin Sivaraman, Scott Wisdom, Hakan Erdogan, John R. Hershey

Figure 1 for Adapting Speech Separation to Real-World Meetings Using Mixture Invariant Training
Figure 2 for Adapting Speech Separation to Real-World Meetings Using Mixture Invariant Training
Viaarxiv icon

Preserving background sound in noise-robust voice conversion via multi-task learning

Add code
Bookmark button
Alert button
Nov 06, 2022
Jixun Yao, Yi Lei, Qing Wang, Pengcheng Guo, Ziqian Ning, Lei Xie, Hai Li, Junhui Liu, Danming Xie

Figure 1 for Preserving background sound in noise-robust voice conversion via multi-task learning
Figure 2 for Preserving background sound in noise-robust voice conversion via multi-task learning
Figure 3 for Preserving background sound in noise-robust voice conversion via multi-task learning
Figure 4 for Preserving background sound in noise-robust voice conversion via multi-task learning
Viaarxiv icon

REAL-M: Towards Speech Separation on Real Mixtures

Add code
Bookmark button
Alert button
Oct 20, 2021
Cem Subakan, Mirco Ravanelli, Samuele Cornell, François Grondin

Figure 1 for REAL-M: Towards Speech Separation on Real Mixtures
Figure 2 for REAL-M: Towards Speech Separation on Real Mixtures
Figure 3 for REAL-M: Towards Speech Separation on Real Mixtures
Figure 4 for REAL-M: Towards Speech Separation on Real Mixtures
Viaarxiv icon

End-to-End Voice Conversion with Information Perturbation

Add code
Bookmark button
Alert button
Jun 15, 2022
Qicong Xie, Shan Yang, Yi Lei, Lei Xie, Dan Su

Figure 1 for End-to-End Voice Conversion with Information Perturbation
Figure 2 for End-to-End Voice Conversion with Information Perturbation
Figure 3 for End-to-End Voice Conversion with Information Perturbation
Figure 4 for End-to-End Voice Conversion with Information Perturbation
Viaarxiv icon

Expressive, Variable, and Controllable Duration Modelling in TTS

Jun 28, 2022
Ammar Abbas, Thomas Merritt, Alexis Moinet, Sri Karlapati, Ewa Muszynska, Simon Slangen, Elia Gatti, Thomas Drugman

Figure 1 for Expressive, Variable, and Controllable Duration Modelling in TTS
Figure 2 for Expressive, Variable, and Controllable Duration Modelling in TTS
Figure 3 for Expressive, Variable, and Controllable Duration Modelling in TTS
Figure 4 for Expressive, Variable, and Controllable Duration Modelling in TTS
Viaarxiv icon

Modular Hybrid Autoregressive Transducer

Oct 31, 2022
Zhong Meng, Tongzhou Chen, Rohit Prabhavalkar, Yu Zhang, Gary Wang, Kartik Audhkhasi, Jesse Emond, Trevor Strohman, Bhuvana Ramabhadran, W. Ronny Huang, Ehsan Variani, Yinghui Huang, Pedro J. Moreno

Figure 1 for Modular Hybrid Autoregressive Transducer
Figure 2 for Modular Hybrid Autoregressive Transducer
Figure 3 for Modular Hybrid Autoregressive Transducer
Figure 4 for Modular Hybrid Autoregressive Transducer
Viaarxiv icon

Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask

Oct 08, 2021
Shaoshi Ling, Chen Shen, Meng Cai, Zejun Ma

Figure 1 for Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask
Figure 2 for Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask
Figure 3 for Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask
Figure 4 for Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask
Viaarxiv icon

AVASpeech-SMAD: A Strongly Labelled Speech and Music Activity Detection Dataset with Label Co-Occurrence

Add code
Bookmark button
Alert button
Nov 02, 2021
Yun-Ning Hung, Karn N. Watcharasupat, Chih-Wei Wu, Iroro Orife, Kelian Li, Pavan Seshadri, Junyoung Lee

Figure 1 for AVASpeech-SMAD: A Strongly Labelled Speech and Music Activity Detection Dataset with Label Co-Occurrence
Figure 2 for AVASpeech-SMAD: A Strongly Labelled Speech and Music Activity Detection Dataset with Label Co-Occurrence
Figure 3 for AVASpeech-SMAD: A Strongly Labelled Speech and Music Activity Detection Dataset with Label Co-Occurrence
Figure 4 for AVASpeech-SMAD: A Strongly Labelled Speech and Music Activity Detection Dataset with Label Co-Occurrence
Viaarxiv icon

Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic Embedding

Jul 21, 2022
Bagus Tris Atmaja, Zanjabila, Akira Sasou

Figure 1 for Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic Embedding
Figure 2 for Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic Embedding
Figure 3 for Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic Embedding
Figure 4 for Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic Embedding
Viaarxiv icon

Exploiting Pre-Trained ASR Models for Alzheimer's Disease Recognition Through Spontaneous Speech

Add code
Bookmark button
Alert button
Oct 04, 2021
Ying Qin, Wei Liu, Zhiyuan Peng, Si-Ioi Ng, Jingyu Li, Haibo Hu, Tan Lee

Figure 1 for Exploiting Pre-Trained ASR Models for Alzheimer's Disease Recognition Through Spontaneous Speech
Figure 2 for Exploiting Pre-Trained ASR Models for Alzheimer's Disease Recognition Through Spontaneous Speech
Figure 3 for Exploiting Pre-Trained ASR Models for Alzheimer's Disease Recognition Through Spontaneous Speech
Figure 4 for Exploiting Pre-Trained ASR Models for Alzheimer's Disease Recognition Through Spontaneous Speech
Viaarxiv icon