Alert button

"speech": models, code, and papers
Alert button

Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos

Jun 27, 2023
Chiori Hori, Puyuan Peng, David Harwath, Xinyu Liu, Kei Ota, Siddarth Jain, Radu Corcodel, Devesh Jha, Diego Romeres, Jonathan Le Roux

Figure 1 for Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos
Figure 2 for Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos
Figure 3 for Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos
Figure 4 for Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos
Viaarxiv icon

Regularizing Contrastive Predictive Coding for Speech Applications

Apr 26, 2023
Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

Figure 1 for Regularizing Contrastive Predictive Coding for Speech Applications
Figure 2 for Regularizing Contrastive Predictive Coding for Speech Applications
Figure 3 for Regularizing Contrastive Predictive Coding for Speech Applications
Figure 4 for Regularizing Contrastive Predictive Coding for Speech Applications
Viaarxiv icon

VoxSnap: X-Large Speaker Verification Dataset on Camera

Aug 14, 2023
Yuke Lin, Xiaoyi Qin, Ming Cheng, Ning Jiang, Guoqing Zhao, Ming Li

Figure 1 for VoxSnap: X-Large Speaker Verification Dataset on Camera
Figure 2 for VoxSnap: X-Large Speaker Verification Dataset on Camera
Figure 3 for VoxSnap: X-Large Speaker Verification Dataset on Camera
Figure 4 for VoxSnap: X-Large Speaker Verification Dataset on Camera
Viaarxiv icon

Towards cross-language prosody transfer for dialog

Add code
Bookmark button
Alert button
Jul 09, 2023
Jonathan E. Avila, Nigel G. Ward

Figure 1 for Towards cross-language prosody transfer for dialog
Figure 2 for Towards cross-language prosody transfer for dialog
Figure 3 for Towards cross-language prosody transfer for dialog
Figure 4 for Towards cross-language prosody transfer for dialog
Viaarxiv icon

Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition

Jun 01, 2023
Tianyi Xu, Zhanheng Yang, Kaixun Huang, Pengcheng Guo, Ao Zhang, Biao Li, Changru Chen, Chao Li, Lei Xie

Figure 1 for Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition
Figure 2 for Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition
Figure 3 for Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition
Figure 4 for Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition
Viaarxiv icon

An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023

Jul 03, 2023
Sheng Zhao, Qilong Yuan, Yibo Duan, Zhuoyue Chen

Figure 1 for An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023
Figure 2 for An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023
Figure 3 for An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023
Figure 4 for An End-to-End Multi-Module Audio Deepfake Generation System for ADD Challenge 2023
Viaarxiv icon

An Efficient Speech Separation Network Based on Recurrent Fusion Dilated Convolution and Channel Attention

Jun 09, 2023
Junyu Wang

Figure 1 for An Efficient Speech Separation Network Based on Recurrent Fusion Dilated Convolution and Channel Attention
Figure 2 for An Efficient Speech Separation Network Based on Recurrent Fusion Dilated Convolution and Channel Attention
Figure 3 for An Efficient Speech Separation Network Based on Recurrent Fusion Dilated Convolution and Channel Attention
Figure 4 for An Efficient Speech Separation Network Based on Recurrent Fusion Dilated Convolution and Channel Attention
Viaarxiv icon

GNCformer Enhanced Self-attention for Automatic Speech Recognition

May 22, 2023
J. Li, Z. Duan, S. Li, X. Yu, G. Yang

Figure 1 for GNCformer Enhanced Self-attention for Automatic Speech Recognition
Figure 2 for GNCformer Enhanced Self-attention for Automatic Speech Recognition
Figure 3 for GNCformer Enhanced Self-attention for Automatic Speech Recognition
Figure 4 for GNCformer Enhanced Self-attention for Automatic Speech Recognition
Viaarxiv icon

Turkish Native Language Identification

Aug 04, 2023
Ahmet Yavuz Uluslu, Gerold Schneider

Figure 1 for Turkish Native Language Identification
Figure 2 for Turkish Native Language Identification
Figure 3 for Turkish Native Language Identification
Figure 4 for Turkish Native Language Identification
Viaarxiv icon

NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning

Jun 21, 2023
Kamer Ali Yuksel, Thiago Ferreira, Golara Javadi, Mohamed El-Badrashiny, Ahmet Gunduz

Figure 1 for NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning
Figure 2 for NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning
Viaarxiv icon