Alert button

"speech": models, code, and papers
Alert button

Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition

Aug 09, 2022
Shijun Wang, Hamed Hemati, Jón Guðnason, Damian Borth

Figure 1 for Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition
Figure 2 for Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition
Figure 3 for Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition
Figure 4 for Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition
Viaarxiv icon

Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction

Add code
Bookmark button
Alert button
Oct 28, 2021
Heming Wang, Yao Qian, Xiaofei Wang, Yiming Wang, Chengyi Wang, Shujie Liu, Takuya Yoshioka, Jinyu Li, DeLiang Wang

Figure 1 for Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction
Figure 2 for Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction
Figure 3 for Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction
Figure 4 for Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction
Viaarxiv icon

ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications

Apr 01, 2022
Gaoxiong Yi, Wei Xiao, Yiming Xiao, Babak Naderi, Sebastian Möller, Wafaa Wardah, Gabriel Mittag, Ross Cutler, Zhuohuang Zhang, Donald S. Williamson, Fei Chen, Fuzheng Yang, Shidong Shang

Figure 1 for ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications
Figure 2 for ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications
Figure 3 for ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications
Figure 4 for ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications
Viaarxiv icon

STFT-Domain Neural Speech Enhancement with Very Low Algorithmic Latency

Add code
Bookmark button
Alert button
Apr 21, 2022
Zhong-Qiu Wang, Gordon Wichern, Shinji Watanabe, Jonathan Le Roux

Figure 1 for STFT-Domain Neural Speech Enhancement with Very Low Algorithmic Latency
Figure 2 for STFT-Domain Neural Speech Enhancement with Very Low Algorithmic Latency
Figure 3 for STFT-Domain Neural Speech Enhancement with Very Low Algorithmic Latency
Figure 4 for STFT-Domain Neural Speech Enhancement with Very Low Algorithmic Latency
Viaarxiv icon

On the Locality of Attention in Direct Speech Translation

Apr 19, 2022
Belen Alastruey, Javier Ferrando, Gerard I. Gállego, Marta R. Costa-jussà

Figure 1 for On the Locality of Attention in Direct Speech Translation
Figure 2 for On the Locality of Attention in Direct Speech Translation
Figure 3 for On the Locality of Attention in Direct Speech Translation
Figure 4 for On the Locality of Attention in Direct Speech Translation
Viaarxiv icon

The PCG-AIID System for L3DAS22 Challenge: MIMO and MISO convolutional recurrent Network for Multi Channel Speech Enhancement and Speech Recognition

Feb 21, 2022
Jingdong Li, Yuanyuan Zhu, Dawei Luo, Yun Liu, Guohui Cui, Zhaoxia Li

Figure 1 for The PCG-AIID System for L3DAS22 Challenge: MIMO and MISO convolutional recurrent Network for Multi Channel Speech Enhancement and Speech Recognition
Figure 2 for The PCG-AIID System for L3DAS22 Challenge: MIMO and MISO convolutional recurrent Network for Multi Channel Speech Enhancement and Speech Recognition
Figure 3 for The PCG-AIID System for L3DAS22 Challenge: MIMO and MISO convolutional recurrent Network for Multi Channel Speech Enhancement and Speech Recognition
Figure 4 for The PCG-AIID System for L3DAS22 Challenge: MIMO and MISO convolutional recurrent Network for Multi Channel Speech Enhancement and Speech Recognition
Viaarxiv icon

SuperVoice: Text-Independent Speaker Verification Using Ultrasound Energy in Human Speech

Add code
Bookmark button
Alert button
May 28, 2022
Hanqing Guo, Qiben Yan, Nikolay Ivanov, Ying Zhu, Li Xiao, Eric J. Hunter

Figure 1 for SuperVoice: Text-Independent Speaker Verification Using Ultrasound Energy in Human Speech
Figure 2 for SuperVoice: Text-Independent Speaker Verification Using Ultrasound Energy in Human Speech
Figure 3 for SuperVoice: Text-Independent Speaker Verification Using Ultrasound Energy in Human Speech
Figure 4 for SuperVoice: Text-Independent Speaker Verification Using Ultrasound Energy in Human Speech
Viaarxiv icon

Towards Representative Subset Selection for Self-Supervised Speech Recognition

Add code
Bookmark button
Alert button
Mar 18, 2022
Abdul Hameed Azeemi, Ihsan Ayyub Qazi, Agha Ali Raza

Figure 1 for Towards Representative Subset Selection for Self-Supervised Speech Recognition
Figure 2 for Towards Representative Subset Selection for Self-Supervised Speech Recognition
Figure 3 for Towards Representative Subset Selection for Self-Supervised Speech Recognition
Figure 4 for Towards Representative Subset Selection for Self-Supervised Speech Recognition
Viaarxiv icon

Do self-supervised speech models develop human-like perception biases?

Add code
Bookmark button
Alert button
May 31, 2022
Juliette Millet, Ewan Dunbar

Figure 1 for Do self-supervised speech models develop human-like perception biases?
Figure 2 for Do self-supervised speech models develop human-like perception biases?
Figure 3 for Do self-supervised speech models develop human-like perception biases?
Figure 4 for Do self-supervised speech models develop human-like perception biases?
Viaarxiv icon

Cognitive Coding of Speech

Oct 08, 2021
Reza Lotfidereshgi, Philippe Gournay

Figure 1 for Cognitive Coding of Speech
Figure 2 for Cognitive Coding of Speech
Figure 3 for Cognitive Coding of Speech
Viaarxiv icon