Alert button

"speech": models, code, and papers
Alert button

Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary tasks

Sep 14, 2023
Danae Sánchez Villegas, Daniel Preoţiuc-Pietro, Nikolaos Aletras

Figure 1 for Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary tasks
Figure 2 for Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary tasks
Figure 3 for Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary tasks
Figure 4 for Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary tasks
Viaarxiv icon

Two-stage Autoencoder Neural Network for 3D Speech Enhancement

Jun 08, 2023
Han Yin, Jisheng Bai, Siwei Huang, Mou Wang, Yafei Jia, Jianfeng Chen

Figure 1 for Two-stage Autoencoder Neural Network for 3D Speech Enhancement
Figure 2 for Two-stage Autoencoder Neural Network for 3D Speech Enhancement
Figure 3 for Two-stage Autoencoder Neural Network for 3D Speech Enhancement
Figure 4 for Two-stage Autoencoder Neural Network for 3D Speech Enhancement
Viaarxiv icon

Text Generation with Speech Synthesis for ASR Data Augmentation

May 22, 2023
Zhuangqun Huang, Gil Keren, Ziran Jiang, Shashank Jain, David Goss-Grubbs, Nelson Cheng, Farnaz Abtahi, Duc Le, David Zhang, Antony D'Avirro, Ethan Campbell-Taylor, Jessie Salas, Irina-Elena Veliche, Xi Chen

Figure 1 for Text Generation with Speech Synthesis for ASR Data Augmentation
Figure 2 for Text Generation with Speech Synthesis for ASR Data Augmentation
Figure 3 for Text Generation with Speech Synthesis for ASR Data Augmentation
Viaarxiv icon

Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models

Add code
Bookmark button
Alert button
May 27, 2023
Yusheng Tian, Guangyan Zhang, Tan Lee

Figure 1 for Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models
Figure 2 for Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models
Figure 3 for Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models
Figure 4 for Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models
Viaarxiv icon

Improving Speech Translation Accuracy and Time Efficiency with Fine-tuned wav2vec 2.0-based Speech Segmentation

Add code
Bookmark button
Alert button
Apr 25, 2023
Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura

Figure 1 for Improving Speech Translation Accuracy and Time Efficiency with Fine-tuned wav2vec 2.0-based Speech Segmentation
Figure 2 for Improving Speech Translation Accuracy and Time Efficiency with Fine-tuned wav2vec 2.0-based Speech Segmentation
Figure 3 for Improving Speech Translation Accuracy and Time Efficiency with Fine-tuned wav2vec 2.0-based Speech Segmentation
Figure 4 for Improving Speech Translation Accuracy and Time Efficiency with Fine-tuned wav2vec 2.0-based Speech Segmentation
Viaarxiv icon

N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition

Add code
Bookmark button
Alert button
Jun 05, 2023
Bashar Talafha, Abdul Waheed, Muhammad Abdul-Mageed

Figure 1 for N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition
Figure 2 for N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition
Figure 3 for N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition
Viaarxiv icon

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

Add code
Bookmark button
Alert button
May 18, 2023
Zhifu Gao, Zerui Li, Jiaming Wang, Haoneng Luo, Xian Shi, Mengzhe Chen, Yabin Li, Lingyun Zuo, Zhihao Du, Zhangyu Xiao, Shiliang Zhang

Figure 1 for FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Figure 2 for FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Figure 3 for FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Figure 4 for FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Viaarxiv icon

Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems

Sep 07, 2023
Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Masayasu Muraoka, George Saon

Viaarxiv icon

Cross-Language Speech Emotion Recognition Using Multimodal Dual Attention Transformers

Jul 14, 2023
Syed Aun Muhammad Zaidi, Siddique Latif, Junaid Qadir

Figure 1 for Cross-Language Speech Emotion Recognition Using Multimodal Dual Attention Transformers
Figure 2 for Cross-Language Speech Emotion Recognition Using Multimodal Dual Attention Transformers
Figure 3 for Cross-Language Speech Emotion Recognition Using Multimodal Dual Attention Transformers
Figure 4 for Cross-Language Speech Emotion Recognition Using Multimodal Dual Attention Transformers
Viaarxiv icon

DUB: Discrete Unit Back-translation for Speech Translation

Add code
Bookmark button
Alert button
May 19, 2023
Dong Zhang, Rong Ye, Tom Ko, Mingxuan Wang, Yaqian Zhou

Figure 1 for DUB: Discrete Unit Back-translation for Speech Translation
Figure 2 for DUB: Discrete Unit Back-translation for Speech Translation
Figure 3 for DUB: Discrete Unit Back-translation for Speech Translation
Figure 4 for DUB: Discrete Unit Back-translation for Speech Translation
Viaarxiv icon