Alert button

"speech": models, code, and papers
Alert button

Leveraging Low-Distortion Target Estimates for Improved Speech Enhancement

Oct 01, 2021
Zhong-Qiu Wang, Gordon Wichern, Jonathan Le Roux

Figure 1 for Leveraging Low-Distortion Target Estimates for Improved Speech Enhancement
Figure 2 for Leveraging Low-Distortion Target Estimates for Improved Speech Enhancement
Figure 3 for Leveraging Low-Distortion Target Estimates for Improved Speech Enhancement
Figure 4 for Leveraging Low-Distortion Target Estimates for Improved Speech Enhancement
Viaarxiv icon

A Convolutional Neural Network Based Approach to Recognize Bangla Spoken Digits from Speech Signal

Nov 12, 2021
Ovishake Sen, Al-Mahmud, Pias Roy

Figure 1 for A Convolutional Neural Network Based Approach to Recognize Bangla Spoken Digits from Speech Signal
Figure 2 for A Convolutional Neural Network Based Approach to Recognize Bangla Spoken Digits from Speech Signal
Figure 3 for A Convolutional Neural Network Based Approach to Recognize Bangla Spoken Digits from Speech Signal
Figure 4 for A Convolutional Neural Network Based Approach to Recognize Bangla Spoken Digits from Speech Signal
Viaarxiv icon

RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis

Add code
Bookmark button
Alert button
Jun 15, 2021
Rohola Zandie, Mohammad H. Mahoor, Julia Madsen, Eshrat S. Emamian

Figure 1 for RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis
Figure 2 for RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis
Figure 3 for RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis
Figure 4 for RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis
Viaarxiv icon

OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models

Add code
Bookmark button
Alert button
Dec 08, 2022
Jinze Bai, Rui Men, Hao Yang, Xuancheng Ren, Kai Dang, Yichang Zhang, Xiaohuan Zhou, Peng Wang, Sinan Tan, An Yang, Zeyu Cui, Yu Han, Shuai Bai, Wenbin Ge, Jianxin Ma, Junyang Lin, Jingren Zhou, Chang Zhou

Figure 1 for OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Figure 2 for OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Figure 3 for OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Figure 4 for OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Viaarxiv icon

Improving Label-Deficient Keyword Spotting Using Self-Supervised Pretraining

Add code
Bookmark button
Alert button
Oct 04, 2022
Holger Severin Bovbjerg, Zheng-Hua Tan

Figure 1 for Improving Label-Deficient Keyword Spotting Using Self-Supervised Pretraining
Figure 2 for Improving Label-Deficient Keyword Spotting Using Self-Supervised Pretraining
Figure 3 for Improving Label-Deficient Keyword Spotting Using Self-Supervised Pretraining
Figure 4 for Improving Label-Deficient Keyword Spotting Using Self-Supervised Pretraining
Viaarxiv icon

Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features

Aug 02, 2022
Jun Xue, Cunhang Fan, Zhao Lv, Jianhua Tao, Jiangyan Yi, Chengshi Zheng, Zhengqi Wen, Minmin Yuan, Shegang Shao

Figure 1 for Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features
Figure 2 for Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features
Figure 3 for Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features
Figure 4 for Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features
Viaarxiv icon

Semantic Communications for Speech Recognition

Jul 22, 2021
Zhenzi Weng, Zhijin Qin, Geoffrey Ye Li

Figure 1 for Semantic Communications for Speech Recognition
Figure 2 for Semantic Communications for Speech Recognition
Figure 3 for Semantic Communications for Speech Recognition
Figure 4 for Semantic Communications for Speech Recognition
Viaarxiv icon

Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings

Oct 08, 2021
Jialu Li, Vimal Manohar, Pooja Chitkara, Andros Tjandra, Michael Picheny, Frank Zhang, Xiaohui Zhang, Yatharth Saraf

Figure 1 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Figure 2 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Figure 3 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Figure 4 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Viaarxiv icon

S-DCCRN: Super Wide Band DCCRN with learnable complex feature for speech enhancement

Add code
Bookmark button
Alert button
Nov 16, 2021
Shubo Lv, Yihui Fu, Mengtao Xing, Jiayao Sun, Lei Xie, Jun Huang, Yannan Wang, Tao Yu

Figure 1 for S-DCCRN: Super Wide Band DCCRN with learnable complex feature for speech enhancement
Figure 2 for S-DCCRN: Super Wide Band DCCRN with learnable complex feature for speech enhancement
Figure 3 for S-DCCRN: Super Wide Band DCCRN with learnable complex feature for speech enhancement
Figure 4 for S-DCCRN: Super Wide Band DCCRN with learnable complex feature for speech enhancement
Viaarxiv icon

Kosp2e: Korean Speech to English Translation Corpus

Add code
Bookmark button
Alert button
Jul 06, 2021
Won Ik Cho, Seok Min Kim, Hyunchang Cho, Nam Soo Kim

Figure 1 for Kosp2e: Korean Speech to English Translation Corpus
Figure 2 for Kosp2e: Korean Speech to English Translation Corpus
Viaarxiv icon