Alert button

"speech": models, code, and papers
Alert button

Noisy Speech Based Temporal Decomposition to Improve Fundamental Frequency Estimation

Dec 18, 2021
A. Queiroz, R. Coelho

Figure 1 for Noisy Speech Based Temporal Decomposition to Improve Fundamental Frequency Estimation
Figure 2 for Noisy Speech Based Temporal Decomposition to Improve Fundamental Frequency Estimation
Figure 3 for Noisy Speech Based Temporal Decomposition to Improve Fundamental Frequency Estimation
Figure 4 for Noisy Speech Based Temporal Decomposition to Improve Fundamental Frequency Estimation
Viaarxiv icon

A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit Rate

Aug 09, 2021
Ahmed Mustafa, Jan Büthe, Srikanth Korse, Kishan Gupta, Guillaume Fuchs, Nicola Pia

Figure 1 for A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit Rate
Figure 2 for A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit Rate
Figure 3 for A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit Rate
Figure 4 for A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit Rate
Viaarxiv icon

Supervised Contrastive Learning for Accented Speech Recognition

Jul 02, 2021
Tao Han, Hantao Huang, Ziang Yang, Wei Han

Figure 1 for Supervised Contrastive Learning for Accented Speech Recognition
Figure 2 for Supervised Contrastive Learning for Accented Speech Recognition
Figure 3 for Supervised Contrastive Learning for Accented Speech Recognition
Figure 4 for Supervised Contrastive Learning for Accented Speech Recognition
Viaarxiv icon

OCD: Learning to Overfit with Conditional Diffusion Models

Oct 10, 2022
Shahar Lutati, Lior Wolf

Figure 1 for OCD: Learning to Overfit with Conditional Diffusion Models
Figure 2 for OCD: Learning to Overfit with Conditional Diffusion Models
Figure 3 for OCD: Learning to Overfit with Conditional Diffusion Models
Figure 4 for OCD: Learning to Overfit with Conditional Diffusion Models
Viaarxiv icon

TransPOS: Transformers for Consolidating Different POS Tagset Datasets

Sep 24, 2022
Alex Li, Ilyas Bankole-Hameed, Ranadeep Singh, Gabriel Shen Han Ng, Akshat Gupta

Figure 1 for TransPOS: Transformers for Consolidating Different POS Tagset Datasets
Figure 2 for TransPOS: Transformers for Consolidating Different POS Tagset Datasets
Figure 3 for TransPOS: Transformers for Consolidating Different POS Tagset Datasets
Figure 4 for TransPOS: Transformers for Consolidating Different POS Tagset Datasets
Viaarxiv icon

Word-Level Style Control for Expressive, Non-attentive Speech Synthesis

Nov 19, 2021
Konstantinos Klapsas, Nikolaos Ellinas, June Sig Sung, Hyoungmin Park, Spyros Raptis

Figure 1 for Word-Level Style Control for Expressive, Non-attentive Speech Synthesis
Figure 2 for Word-Level Style Control for Expressive, Non-attentive Speech Synthesis
Figure 3 for Word-Level Style Control for Expressive, Non-attentive Speech Synthesis
Figure 4 for Word-Level Style Control for Expressive, Non-attentive Speech Synthesis
Viaarxiv icon

Fully Automated End-to-End Fake Audio Detection

Aug 20, 2022
Chenglong Wang, Jiangyan Yi, Jianhua Tao, Haiyang Sun, Xun Chen, Zhengkun Tian, Haoxin Ma, Cunhang Fan, Ruibo Fu

Figure 1 for Fully Automated End-to-End Fake Audio Detection
Figure 2 for Fully Automated End-to-End Fake Audio Detection
Figure 3 for Fully Automated End-to-End Fake Audio Detection
Figure 4 for Fully Automated End-to-End Fake Audio Detection
Viaarxiv icon

An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis

Sep 28, 2022
Tobias Hallmen, Silvan Mertes, Dominik Schiller, Elisabeth André

Figure 1 for An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis
Figure 2 for An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis
Figure 3 for An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis
Viaarxiv icon

Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR

Jul 03, 2022
Kun Wei, Yike Zhang, Sining Sun, Lei Xie, Long Ma

Figure 1 for Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR
Figure 2 for Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR
Figure 3 for Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR
Figure 4 for Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR
Viaarxiv icon

Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon

Feb 15, 2021
Hadi Veisi, Hawre Hosseini, Mohammad Mohammadamini, Wirya Fathy, Aso Mahmudi

Figure 1 for Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon
Figure 2 for Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon
Figure 3 for Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon
Figure 4 for Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon
Viaarxiv icon