Alert button

"speech": models, code, and papers
Alert button

Multilingual Speech-to-Speech Translation into Multiple Target Languages

Jul 17, 2023
Hongyu Gong, Ning Dong, Sravya Popuri, Vedanuj Goswami, Ann Lee, Juan Pino

Viaarxiv icon

Privacy-preserving and Privacy-attacking Approaches for Speech and Audio -- A Survey

Sep 26, 2023
Yuchen Liu, Apu Kapadia, Donald Williamson

Viaarxiv icon

Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset

Add code
Bookmark button
Alert button
Oct 08, 2023
Ze Liu

Figure 1 for Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset
Figure 2 for Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset
Figure 3 for Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset
Figure 4 for Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset
Viaarxiv icon

Federated Learning with Differential Privacy for End-to-End Speech Recognition

Sep 29, 2023
Martin Pelikan, Sheikh Shams Azam, Vitaly Feldman, Jan "Honza" Silovsky, Kunal Talwar, Tatiana Likhomanenko

Viaarxiv icon

Deepfake audio as a data augmentation technique for training automatic speech to text transcription models

Add code
Bookmark button
Alert button
Sep 22, 2023
Alexandre R. Ferreira, Cláudio E. C. Campelo

Figure 1 for Deepfake audio as a data augmentation technique for training automatic speech to text transcription models
Figure 2 for Deepfake audio as a data augmentation technique for training automatic speech to text transcription models
Figure 3 for Deepfake audio as a data augmentation technique for training automatic speech to text transcription models
Figure 4 for Deepfake audio as a data augmentation technique for training automatic speech to text transcription models
Viaarxiv icon

Augmenting conformers with structured state space models for online speech recognition

Sep 15, 2023
Haozhe Shan, Albert Gu, Zhong Meng, Weiran Wang, Krzysztof Choromanski, Tara Sainath

Figure 1 for Augmenting conformers with structured state space models for online speech recognition
Figure 2 for Augmenting conformers with structured state space models for online speech recognition
Figure 3 for Augmenting conformers with structured state space models for online speech recognition
Figure 4 for Augmenting conformers with structured state space models for online speech recognition
Viaarxiv icon

AudioFool: Fast, Universal and synchronization-free Cross-Domain Attack on Speech Recognition

Sep 20, 2023
Mohamad Fakih, Rouwaida Kanj, Fadi Kurdahi, Mohammed E. Fouda

Viaarxiv icon

LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT

Add code
Bookmark button
Alert button
Oct 11, 2023
Jiaming Wang, Zhihao Du, Qian Chen, Yunfei Chu, Zhifu Gao, Zerui Li, Kai Hu, Xiaohuan Zhou, Jin Xu, Ziyang Ma, Wen Wang, Siqi Zheng, Chang Zhou, Zhijie Yan, Shiliang Zhang

Figure 1 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Figure 2 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Figure 3 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Figure 4 for LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Viaarxiv icon

SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge

Sep 04, 2023
Jiaxu Zhu, Changhe Song, Zhiyong Wu, Helen Meng

Figure 1 for SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Figure 2 for SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Figure 3 for SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Figure 4 for SememeASR: Boosting Performance of End-to-End Speech Recognition against Domain and Long-Tailed Data Shift with Sememe Semantic Knowledge
Viaarxiv icon

BanLemma: A Word Formation Dependent Rule and Dictionary Based Bangla Lemmatizer

Add code
Bookmark button
Alert button
Nov 06, 2023
Sadia Afrin, Md. Shahad Mahmud Chowdhury, Md. Ekramul Islam, Faisal Ahamed Khan, Labib Imam Chowdhury, MD. Motahar Mahtab, Nazifa Nuha Chowdhury, Massud Forkan, Neelima Kundu, Hakim Arif, Mohammad Mamun Or Rashid, Mohammad Ruhul Amin, Nabeel Mohammed

Viaarxiv icon