Alert button

"speech": models, code, and papers
Alert button

FDLP-Spectrogram: Capturing Speech Dynamics in Spectrograms for End-to-end Automatic Speech Recognition

Add code
Bookmark button
Alert button
Mar 25, 2021
Samik Sadhu, Hynek Hermansky

Figure 1 for FDLP-Spectrogram: Capturing Speech Dynamics in Spectrograms for End-to-end Automatic Speech Recognition
Figure 2 for FDLP-Spectrogram: Capturing Speech Dynamics in Spectrograms for End-to-end Automatic Speech Recognition
Figure 3 for FDLP-Spectrogram: Capturing Speech Dynamics in Spectrograms for End-to-end Automatic Speech Recognition
Figure 4 for FDLP-Spectrogram: Capturing Speech Dynamics in Spectrograms for End-to-end Automatic Speech Recognition
Viaarxiv icon

S-DCCRN: Super Wide Band DCCRN with learnable complex feature for speech enhancement

Add code
Bookmark button
Alert button
Nov 16, 2021
Shubo Lv, Yihui Fu, Mengtao Xing, Jiayao Sun, Lei Xie, Jun Huang, Yannan Wang, Tao Yu

Figure 1 for S-DCCRN: Super Wide Band DCCRN with learnable complex feature for speech enhancement
Figure 2 for S-DCCRN: Super Wide Band DCCRN with learnable complex feature for speech enhancement
Figure 3 for S-DCCRN: Super Wide Band DCCRN with learnable complex feature for speech enhancement
Figure 4 for S-DCCRN: Super Wide Band DCCRN with learnable complex feature for speech enhancement
Viaarxiv icon

The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines

Add code
Bookmark button
Alert button
Aug 17, 2022
Gaofeng Cheng, Yifan Chen, Runyan Yang, Qingxuan Li, Zehui Yang, Lingxuan Ye, Pengyuan Zhang, Qingqing Zhang, Lei Xie, Yanmin Qian, Kong Aik Lee, Yonghong Yan

Figure 1 for The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines
Figure 2 for The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines
Figure 3 for The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines
Figure 4 for The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines
Viaarxiv icon

HLT-NUS SUBMISSION FOR 2020 NIST Conversational Telephone Speech SRE

Add code
Bookmark button
Alert button
Nov 12, 2021
Rohan Kumar Das, Ruijie Tao, Haizhou Li

Figure 1 for HLT-NUS SUBMISSION FOR 2020 NIST Conversational Telephone Speech SRE
Figure 2 for HLT-NUS SUBMISSION FOR 2020 NIST Conversational Telephone Speech SRE
Viaarxiv icon

Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings

Oct 08, 2021
Jialu Li, Vimal Manohar, Pooja Chitkara, Andros Tjandra, Michael Picheny, Frank Zhang, Xiaohui Zhang, Yatharth Saraf

Figure 1 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Figure 2 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Figure 3 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Figure 4 for Accent-Robust Automatic Speech Recognition Using Supervised and Unsupervised Wav2vec Embeddings
Viaarxiv icon

Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis

Add code
Bookmark button
Alert button
Apr 01, 2022
Fan-Lin Wang, Po-chun Hsu, Da-rong Liu, Hung-yi Lee

Figure 1 for Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis
Figure 2 for Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis
Figure 3 for Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis
Figure 4 for Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis
Viaarxiv icon

RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis

Add code
Bookmark button
Alert button
Jun 15, 2021
Rohola Zandie, Mohammad H. Mahoor, Julia Madsen, Eshrat S. Emamian

Figure 1 for RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis
Figure 2 for RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis
Figure 3 for RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis
Figure 4 for RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis
Viaarxiv icon

Prediction of Listener Perception of Argumentative Speech in a Crowdsourced Dataset Using (Psycho-)Linguistic and Fluency Features

Nov 30, 2021
Yu Qiao, Sourabh Zanwar, Rishab Bhattacharyya, Daniel Wiechmann, Wei Zhou, Elma Kerz, Ralf Schlüter

Figure 1 for Prediction of Listener Perception of Argumentative Speech in a Crowdsourced Dataset Using (Psycho-)Linguistic and Fluency Features
Figure 2 for Prediction of Listener Perception of Argumentative Speech in a Crowdsourced Dataset Using (Psycho-)Linguistic and Fluency Features
Figure 3 for Prediction of Listener Perception of Argumentative Speech in a Crowdsourced Dataset Using (Psycho-)Linguistic and Fluency Features
Figure 4 for Prediction of Listener Perception of Argumentative Speech in a Crowdsourced Dataset Using (Psycho-)Linguistic and Fluency Features
Viaarxiv icon

FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition

Sep 15, 2021
Bonaventure F. P. Dossou, Yeno K. S. Gbenou

Figure 1 for FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition
Figure 2 for FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition
Figure 3 for FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition
Figure 4 for FSER: Deep Convolutional Neural Networks for Speech Emotion Recognition
Viaarxiv icon

SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points

Nov 08, 2021
Yu-Chen Lin, Cheng Yu, Yi-Te Hsu, Szu-Wei Fu, Yu Tsao, Tei-Wei Kuo

Figure 1 for SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points
Figure 2 for SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points
Figure 3 for SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points
Figure 4 for SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points
Viaarxiv icon