Alert button

"speech": models, code, and papers
Alert button

Deep Ad-hoc Beamforming Based on Speaker Extraction for Target-Dependent Speech Separation

Dec 01, 2020
Ziye Yang, Shanzheng Guan, Xiao-Lei Zhang

Figure 1 for Deep Ad-hoc Beamforming Based on Speaker Extraction for Target-Dependent Speech Separation
Figure 2 for Deep Ad-hoc Beamforming Based on Speaker Extraction for Target-Dependent Speech Separation
Figure 3 for Deep Ad-hoc Beamforming Based on Speaker Extraction for Target-Dependent Speech Separation
Figure 4 for Deep Ad-hoc Beamforming Based on Speaker Extraction for Target-Dependent Speech Separation
Viaarxiv icon

A Large-scale Dataset for Hate Speech Detection on Vietnamese Social Media Texts

Apr 05, 2021
Son T. Luu, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Figure 1 for A Large-scale Dataset for Hate Speech Detection on Vietnamese Social Media Texts
Figure 2 for A Large-scale Dataset for Hate Speech Detection on Vietnamese Social Media Texts
Figure 3 for A Large-scale Dataset for Hate Speech Detection on Vietnamese Social Media Texts
Figure 4 for A Large-scale Dataset for Hate Speech Detection on Vietnamese Social Media Texts
Viaarxiv icon

Arabic Code-Switching Speech Recognition using Monolingual Data

Jul 04, 2021
Ahmed Ali, Shammur Chowdhury, Amir Hussein, Yasser Hifny

Figure 1 for Arabic Code-Switching Speech Recognition using Monolingual Data
Figure 2 for Arabic Code-Switching Speech Recognition using Monolingual Data
Figure 3 for Arabic Code-Switching Speech Recognition using Monolingual Data
Figure 4 for Arabic Code-Switching Speech Recognition using Monolingual Data
Viaarxiv icon

Thutmose Tagger: Single-pass neural model for Inverse Text Normalization

Jul 29, 2022
Alexandra Antonova, Evelina Bakhturina, Boris Ginsburg

Figure 1 for Thutmose Tagger: Single-pass neural model for Inverse Text Normalization
Figure 2 for Thutmose Tagger: Single-pass neural model for Inverse Text Normalization
Figure 3 for Thutmose Tagger: Single-pass neural model for Inverse Text Normalization
Figure 4 for Thutmose Tagger: Single-pass neural model for Inverse Text Normalization
Viaarxiv icon

Speech enhancement aided end-to-end multi-task learning for voice activity detection

Oct 23, 2020
Xu Tan, Xiao-Lei Zhang

Figure 1 for Speech enhancement aided end-to-end multi-task learning for voice activity detection
Figure 2 for Speech enhancement aided end-to-end multi-task learning for voice activity detection
Figure 3 for Speech enhancement aided end-to-end multi-task learning for voice activity detection
Figure 4 for Speech enhancement aided end-to-end multi-task learning for voice activity detection
Viaarxiv icon

Joint prediction of truecasing and punctuation for conversational speech in low-resource scenarios

Sep 13, 2021
Raghavendra Pappagari, Piotr Żelasko, Agnieszka Mikołajczyk, Piotr Pęzik, Najim Dehak

Figure 1 for Joint prediction of truecasing and punctuation for conversational speech in low-resource scenarios
Figure 2 for Joint prediction of truecasing and punctuation for conversational speech in low-resource scenarios
Figure 3 for Joint prediction of truecasing and punctuation for conversational speech in low-resource scenarios
Figure 4 for Joint prediction of truecasing and punctuation for conversational speech in low-resource scenarios
Viaarxiv icon

Dictionary Attacks on Speaker Verification

Apr 24, 2022
Mirko Marras, Pawel Korus, Anubhav Jain, Nasir Memon

Figure 1 for Dictionary Attacks on Speaker Verification
Figure 2 for Dictionary Attacks on Speaker Verification
Figure 3 for Dictionary Attacks on Speaker Verification
Figure 4 for Dictionary Attacks on Speaker Verification
Viaarxiv icon

Emotion-Controllable Generalized Talking Face Generation

May 02, 2022
Sanjana Sinha, Sandika Biswas, Ravindra Yadav, Brojeshwar Bhowmick

Figure 1 for Emotion-Controllable Generalized Talking Face Generation
Figure 2 for Emotion-Controllable Generalized Talking Face Generation
Figure 3 for Emotion-Controllable Generalized Talking Face Generation
Figure 4 for Emotion-Controllable Generalized Talking Face Generation
Viaarxiv icon

CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings

Apr 20, 2020
Shinji Watanabe, Michael Mandel, Jon Barker, Emmanuel Vincent

Figure 1 for CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings
Figure 2 for CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings
Figure 3 for CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings
Figure 4 for CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings
Viaarxiv icon

Maximum Phase Modeling for Sparse Linear Prediction of Speech

Jun 07, 2020
Thomas Drugman

Figure 1 for Maximum Phase Modeling for Sparse Linear Prediction of Speech
Figure 2 for Maximum Phase Modeling for Sparse Linear Prediction of Speech
Figure 3 for Maximum Phase Modeling for Sparse Linear Prediction of Speech
Figure 4 for Maximum Phase Modeling for Sparse Linear Prediction of Speech
Viaarxiv icon