Alert button

"speech": models, code, and papers
Alert button

Kurdish (Sorani) Speech to Text: Presenting an Experimental Dataset

Add code
Bookmark button
Alert button
Dec 02, 2019
Akam Qader, Hossein Hassani

Figure 1 for Kurdish (Sorani) Speech to Text: Presenting an Experimental Dataset
Viaarxiv icon

Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders

Add code
Bookmark button
Alert button
Oct 25, 2019
Andy T. Liu, Shu-wen Yang, Po-Han Chi, Po-chun Hsu, Hung-yi Lee

Figure 1 for Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
Figure 2 for Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
Figure 3 for Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
Figure 4 for Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders
Viaarxiv icon

Generalization Ability of MOS Prediction Networks

Add code
Bookmark button
Alert button
Oct 18, 2021
Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi

Figure 1 for Generalization Ability of MOS Prediction Networks
Figure 2 for Generalization Ability of MOS Prediction Networks
Figure 3 for Generalization Ability of MOS Prediction Networks
Figure 4 for Generalization Ability of MOS Prediction Networks
Viaarxiv icon

FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

Add code
Bookmark button
Alert button
Jun 03, 2021
Yichong Leng, Xu Tan, Linchen Zhu, Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiang-Yang Li, Ed Lin, Tie-Yan Liu

Figure 1 for FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
Figure 2 for FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
Figure 3 for FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
Figure 4 for FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
Viaarxiv icon

Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers

Apr 19, 2021
Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux

Figure 1 for Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers
Figure 2 for Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers
Figure 3 for Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers
Viaarxiv icon

Towards Interpretable Multilingual Detection of Hate Speech against Immigrants and Women in Twitter at SemEval-2019 Task 5

Nov 26, 2020
Alvi Md Ishmam

Figure 1 for Towards Interpretable Multilingual Detection of Hate Speech against Immigrants and Women in Twitter at SemEval-2019 Task 5
Figure 2 for Towards Interpretable Multilingual Detection of Hate Speech against Immigrants and Women in Twitter at SemEval-2019 Task 5
Figure 3 for Towards Interpretable Multilingual Detection of Hate Speech against Immigrants and Women in Twitter at SemEval-2019 Task 5
Figure 4 for Towards Interpretable Multilingual Detection of Hate Speech against Immigrants and Women in Twitter at SemEval-2019 Task 5
Viaarxiv icon

Efficient Non-Autoregressive GAN Voice Conversion using VQWav2vec Features and Dynamic Convolution

Add code
Bookmark button
Alert button
Mar 31, 2022
Mingjie Chen, Yanghao Zhou, Heyan Huang, Thomas Hain

Figure 1 for Efficient Non-Autoregressive GAN Voice Conversion using VQWav2vec Features and Dynamic Convolution
Figure 2 for Efficient Non-Autoregressive GAN Voice Conversion using VQWav2vec Features and Dynamic Convolution
Figure 3 for Efficient Non-Autoregressive GAN Voice Conversion using VQWav2vec Features and Dynamic Convolution
Figure 4 for Efficient Non-Autoregressive GAN Voice Conversion using VQWav2vec Features and Dynamic Convolution
Viaarxiv icon

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

Add code
Bookmark button
Alert button
May 24, 2022
Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, Ziyi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji

Figure 1 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Figure 2 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Figure 3 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Figure 4 for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Viaarxiv icon

Lite Audio-Visual Speech Enhancement

Add code
Bookmark button
Alert button
May 24, 2020
Shang-Yi Chuang, Yu Tsao, Chen-Chou Lo, Hsin-Min Wang

Figure 1 for Lite Audio-Visual Speech Enhancement
Figure 2 for Lite Audio-Visual Speech Enhancement
Figure 3 for Lite Audio-Visual Speech Enhancement
Figure 4 for Lite Audio-Visual Speech Enhancement
Viaarxiv icon

Unsupervised vs. transfer learning for multimodal one-shot matching of speech and images

Add code
Bookmark button
Alert button
Aug 14, 2020
Leanne Nortje, Herman Kamper

Figure 1 for Unsupervised vs. transfer learning for multimodal one-shot matching of speech and images
Figure 2 for Unsupervised vs. transfer learning for multimodal one-shot matching of speech and images
Figure 3 for Unsupervised vs. transfer learning for multimodal one-shot matching of speech and images
Figure 4 for Unsupervised vs. transfer learning for multimodal one-shot matching of speech and images
Viaarxiv icon