Alert button

"speech": models, code, and papers
Alert button

Building speech corpus with diverse voice characteristics for its prompt-based representation

Mar 20, 2024
Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Wataru Nakata, Detai Xin, Hiroshi Saruwatari

Figure 1 for Building speech corpus with diverse voice characteristics for its prompt-based representation
Figure 2 for Building speech corpus with diverse voice characteristics for its prompt-based representation
Figure 3 for Building speech corpus with diverse voice characteristics for its prompt-based representation
Figure 4 for Building speech corpus with diverse voice characteristics for its prompt-based representation
Viaarxiv icon

Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition

Mar 28, 2024
Yash Jain, David Chan, Pranav Dheram, Aparna Khare, Olabanji Shonibare, Venkatesh Ravichandran, Shalini Ghosh

Figure 1 for Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Figure 2 for Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Figure 3 for Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Figure 4 for Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition
Viaarxiv icon

Cross-Domain Audio Deepfake Detection: Dataset and Analysis

Apr 07, 2024
Yuang Li, Min Zhang, Mengxin Ren, Miaomiao Ma, Daimeng Wei, Hao Yang

Viaarxiv icon

Wav2Gloss: Generating Interlinear Glossed Text from Speech

Mar 19, 2024
Taiqi He, Kwanghee Choi, Lindia Tjuatja, Nathaniel R. Robinson, Jiatong Shi, Shinji Watanabe, Graham Neubig, David R. Mortensen, Lori Levin

Figure 1 for Wav2Gloss: Generating Interlinear Glossed Text from Speech
Figure 2 for Wav2Gloss: Generating Interlinear Glossed Text from Speech
Figure 3 for Wav2Gloss: Generating Interlinear Glossed Text from Speech
Figure 4 for Wav2Gloss: Generating Interlinear Glossed Text from Speech
Viaarxiv icon

Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales

Mar 19, 2024
Ayushi Nirmal, Amrita Bhattacharjee, Paras Sheth, Huan Liu

Figure 1 for Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales
Figure 2 for Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales
Figure 3 for Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales
Figure 4 for Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales
Viaarxiv icon

Voice-Assisted Real-Time Traffic Sign Recognition System Using Convolutional Neural Network

Apr 11, 2024
Mayura Manawadu, Udaya Wijenayake

Viaarxiv icon

Visually Grounded Speech Models have a Mutual Exclusivity Bias

Mar 20, 2024
Leanne Nortje, Dan Oneaţă, Yevgen Matusevych, Herman Kamper

Figure 1 for Visually Grounded Speech Models have a Mutual Exclusivity Bias
Figure 2 for Visually Grounded Speech Models have a Mutual Exclusivity Bias
Figure 3 for Visually Grounded Speech Models have a Mutual Exclusivity Bias
Figure 4 for Visually Grounded Speech Models have a Mutual Exclusivity Bias
Viaarxiv icon

Analyzing Toxicity in Deep Conversations: A Reddit Case Study

Apr 11, 2024
Vigneshwaran Shankaran, Rajesh Sharma

Viaarxiv icon

Charles Translator: A Machine Translation System between Ukrainian and Czech

Apr 10, 2024
Martin Popel, Lucie Poláková, Michal Novák, Jindřich Helcl, Jindřich Libovický, Pavel Straňák, Tomáš Krabač, Jaroslava Hlaváčová, Mariia Anisimova, Tereza Chlaňová

Viaarxiv icon

Hatred Stems from Ignorance! Distillation of the Persuasion Modes in Countering Conversational Hate Speech

Mar 18, 2024
Ghadi Alyahya, Abeer Aldayel

Viaarxiv icon