Alert button

"speech": models, code, and papers
Alert button

Separate Anything You Describe

Aug 09, 2023
Xubo Liu, Qiuqiang Kong, Yan Zhao, Haohe Liu, Yi Yuan, Yuzhuo Liu, Rui Xia, Yuxuan Wang, Mark D. Plumbley, Wenwu Wang

Figure 1 for Separate Anything You Describe
Figure 2 for Separate Anything You Describe
Figure 3 for Separate Anything You Describe
Figure 4 for Separate Anything You Describe
Viaarxiv icon

Cross-Attribute Matrix Factorization Model with Shared User Embedding

Aug 14, 2023
Wen Liang, Zeng Fan, Youzhi Liang, Jianguo Jia

Figure 1 for Cross-Attribute Matrix Factorization Model with Shared User Embedding
Figure 2 for Cross-Attribute Matrix Factorization Model with Shared User Embedding
Figure 3 for Cross-Attribute Matrix Factorization Model with Shared User Embedding
Figure 4 for Cross-Attribute Matrix Factorization Model with Shared User Embedding
Viaarxiv icon

Regularizing Contrastive Predictive Coding for Speech Applications

Apr 26, 2023
Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

Figure 1 for Regularizing Contrastive Predictive Coding for Speech Applications
Figure 2 for Regularizing Contrastive Predictive Coding for Speech Applications
Figure 3 for Regularizing Contrastive Predictive Coding for Speech Applications
Figure 4 for Regularizing Contrastive Predictive Coding for Speech Applications
Viaarxiv icon

AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis

May 08, 2023
Hendric Voß, Stefan Kopp

Figure 1 for AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis
Figure 2 for AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis
Figure 3 for AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis
Figure 4 for AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis
Viaarxiv icon

Investigating the Sensitivity of Automatic Speech Recognition Systems to Phonetic Variation in L2 Englishes

May 12, 2023
Emma O'Neill, Julie Carson-Berndsen

Figure 1 for Investigating the Sensitivity of Automatic Speech Recognition Systems to Phonetic Variation in L2 Englishes
Figure 2 for Investigating the Sensitivity of Automatic Speech Recognition Systems to Phonetic Variation in L2 Englishes
Figure 3 for Investigating the Sensitivity of Automatic Speech Recognition Systems to Phonetic Variation in L2 Englishes
Figure 4 for Investigating the Sensitivity of Automatic Speech Recognition Systems to Phonetic Variation in L2 Englishes
Viaarxiv icon

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

Jul 07, 2023
Sara Papi, Peidong Wan, Junkun Chen, Jian Xue, Jinyu Li, Yashesh Gaur

Figure 1 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 2 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 3 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Figure 4 for Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Viaarxiv icon

Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR

Apr 11, 2023
Yuchen Hu, Chen Chen, Qiushi Zhu, Eng Siong Chng

Figure 1 for Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
Figure 2 for Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
Figure 3 for Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
Figure 4 for Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
Viaarxiv icon

On the rise of fear speech in online social media

Mar 18, 2023
Punyajoy Saha, Kiran Garimella, Narla Komal Kalyan, Saurabh Kumar Pandey, Pauras Mangesh Meher, Binny Mathew, Animesh Mukherjee

Figure 1 for On the rise of fear speech in online social media
Figure 2 for On the rise of fear speech in online social media
Figure 3 for On the rise of fear speech in online social media
Figure 4 for On the rise of fear speech in online social media
Viaarxiv icon

Adaptation and Optimization of Automatic Speech Recognition (ASR) for the Maritime Domain in the Field of VHF Communication

Jun 01, 2023
Emin Cagatay Nakilcioglu, Maximilian Reimann, Ole John

Viaarxiv icon

Auditory Attention Decoding with Task-Related Multi-View Contrastive Learning

Aug 08, 2023
Xiaoyu Chen, Changde Du, Qiongyi Zhou, Huiguang He

Figure 1 for Auditory Attention Decoding with Task-Related Multi-View Contrastive Learning
Figure 2 for Auditory Attention Decoding with Task-Related Multi-View Contrastive Learning
Figure 3 for Auditory Attention Decoding with Task-Related Multi-View Contrastive Learning
Figure 4 for Auditory Attention Decoding with Task-Related Multi-View Contrastive Learning
Viaarxiv icon