Alert button

"speech": models, code, and papers
Alert button

On the Impact of Noises in Crowd-Sourced Data for Speech Translation

Add code
Bookmark button
Alert button
Jun 28, 2022
Siqi Ouyang, Rong Ye, Lei Li

Figure 1 for On the Impact of Noises in Crowd-Sourced Data for Speech Translation
Figure 2 for On the Impact of Noises in Crowd-Sourced Data for Speech Translation
Figure 3 for On the Impact of Noises in Crowd-Sourced Data for Speech Translation
Figure 4 for On the Impact of Noises in Crowd-Sourced Data for Speech Translation
Viaarxiv icon

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

Add code
Bookmark button
Alert button
Feb 20, 2023
Xiao Wang, Guangyao Chen, Guangwu Qian, Pengcheng Gao, Xiao-Yong Wei, Yaowei Wang, Yonghong Tian, Wen Gao

Figure 1 for Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Figure 2 for Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Figure 3 for Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Figure 4 for Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Viaarxiv icon

Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation

Add code
Bookmark button
Alert button
Feb 18, 2022
Disong Wang, Songxiang Liu, Xixin Wu, Hui Lu, Lifa Sun, Xunying Liu, Helen Meng

Figure 1 for Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation
Figure 2 for Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation
Figure 3 for Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation
Figure 4 for Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation
Viaarxiv icon

Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition

Sep 17, 2022
Ye Bai, Jie Li, Wenjing Han, Hao Ni, Kaituo Xu, Zhuo Zhang, Cheng Yi, Xiaorui Wang

Figure 1 for Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition
Figure 2 for Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition
Figure 3 for Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition
Figure 4 for Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition
Viaarxiv icon

Distinguishing between pre- and post-treatment in the speech of patients with chronic obstructive pulmonary disease

Jul 26, 2022
Andreas Triantafyllopoulos, Markus Fendler, Anton Batliner, Maurice Gerczuk, Shahin Amiriparian, Thomas M. Berghaus, Björn W. Schuller

Figure 1 for Distinguishing between pre- and post-treatment in the speech of patients with chronic obstructive pulmonary disease
Figure 2 for Distinguishing between pre- and post-treatment in the speech of patients with chronic obstructive pulmonary disease
Viaarxiv icon

An enhanced Conv-TasNet model for speech separation using a speaker distance-based loss function

Add code
Bookmark button
Alert button
Jun 01, 2022
Jose A. Arango-Sánchez, Julián D. Arias-Londoño

Figure 1 for An enhanced Conv-TasNet model for speech separation using a speaker distance-based loss function
Figure 2 for An enhanced Conv-TasNet model for speech separation using a speaker distance-based loss function
Figure 3 for An enhanced Conv-TasNet model for speech separation using a speaker distance-based loss function
Figure 4 for An enhanced Conv-TasNet model for speech separation using a speaker distance-based loss function
Viaarxiv icon

Robust Federated Learning Against Adversarial Attacks for Speech Emotion Recognition

Mar 09, 2022
Yi Chang, Sofiane Laridi, Zhao Ren, Gregory Palmer, Björn W. Schuller, Marco Fisichella

Figure 1 for Robust Federated Learning Against Adversarial Attacks for Speech Emotion Recognition
Figure 2 for Robust Federated Learning Against Adversarial Attacks for Speech Emotion Recognition
Figure 3 for Robust Federated Learning Against Adversarial Attacks for Speech Emotion Recognition
Figure 4 for Robust Federated Learning Against Adversarial Attacks for Speech Emotion Recognition
Viaarxiv icon

Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models

Add code
Bookmark button
Alert button
Sep 29, 2022
Raphael Olivier, Hadi Abdullah, Bhiksha Raj

Figure 1 for Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models
Figure 2 for Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models
Figure 3 for Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models
Figure 4 for Watch What You Pretrain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models
Viaarxiv icon

Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models

Oct 07, 2021
Liang-Hsuan Tseng, Yu-Kuan Fu, Heng-Jui Chang, Hung-yi Lee

Figure 1 for Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models
Figure 2 for Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models
Figure 3 for Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models
Figure 4 for Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models
Viaarxiv icon

Adaptive Activation Network For Low Resource Multilingual Speech Recognition

May 28, 2022
Jian Luo, Jianzong Wang, Ning Cheng, Zhenpeng Zheng, Jing Xiao

Figure 1 for Adaptive Activation Network For Low Resource Multilingual Speech Recognition
Figure 2 for Adaptive Activation Network For Low Resource Multilingual Speech Recognition
Figure 3 for Adaptive Activation Network For Low Resource Multilingual Speech Recognition
Figure 4 for Adaptive Activation Network For Low Resource Multilingual Speech Recognition
Viaarxiv icon