Alert button

"speech": models, code, and papers
Alert button

Phonemic Representation and Transcription for Speech to Text Applications for Under-resourced Indigenous African Languages: The Case of Kiswahili

Add code
Bookmark button
Alert button
Oct 29, 2022
Ebbie Awino, Lilian Wanzare, Lawrence Muchemi, Barack Wanjawa, Edward Ombui, Florence Indede, Owen McOnyango, Benard Okal

Figure 1 for Phonemic Representation and Transcription for Speech to Text Applications for Under-resourced Indigenous African Languages: The Case of Kiswahili
Figure 2 for Phonemic Representation and Transcription for Speech to Text Applications for Under-resourced Indigenous African Languages: The Case of Kiswahili
Figure 3 for Phonemic Representation and Transcription for Speech to Text Applications for Under-resourced Indigenous African Languages: The Case of Kiswahili
Figure 4 for Phonemic Representation and Transcription for Speech to Text Applications for Under-resourced Indigenous African Languages: The Case of Kiswahili
Viaarxiv icon

Hey ASR System! Why Aren't You More Inclusive? Automatic Speech Recognition Systems' Bias and Proposed Bias Mitigation Techniques. A Literature Review

Add code
Bookmark button
Alert button
Nov 17, 2022
Mikel K. Ngueajio, Gloria Washington

Viaarxiv icon

Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings

Add code
Bookmark button
Alert button
Oct 05, 2022
Tenglong Ao, Qingzhe Gao, Yuke Lou, Baoquan Chen, Libin Liu

Figure 1 for Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings
Figure 2 for Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings
Figure 3 for Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings
Figure 4 for Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings
Viaarxiv icon

Multi-Scale Feature Fusion Transformer Network for End-to-End Single Channel Speech Separation

Dec 14, 2022
Yinhao Xu, Jian Zhou, Liang Tao, Hon Keung Kwan

Figure 1 for Multi-Scale Feature Fusion Transformer Network for End-to-End Single Channel Speech Separation
Figure 2 for Multi-Scale Feature Fusion Transformer Network for End-to-End Single Channel Speech Separation
Figure 3 for Multi-Scale Feature Fusion Transformer Network for End-to-End Single Channel Speech Separation
Figure 4 for Multi-Scale Feature Fusion Transformer Network for End-to-End Single Channel Speech Separation
Viaarxiv icon

Contextual-Utterance Training for Automatic Speech Recognition

Oct 27, 2022
Alejandro Gomez-Alanis, Lukas Drude, Andreas Schwarz, Rupak Vignesh Swaminathan, Simon Wiesler

Figure 1 for Contextual-Utterance Training for Automatic Speech Recognition
Figure 2 for Contextual-Utterance Training for Automatic Speech Recognition
Figure 3 for Contextual-Utterance Training for Automatic Speech Recognition
Figure 4 for Contextual-Utterance Training for Automatic Speech Recognition
Viaarxiv icon

Distance-based Weight Transfer for Fine-tuning from Near-field to Far-field Speaker Verification

Mar 01, 2023
Li Zhang, Qing Wang, Hongji Wang, Yue Li, Wei Rao, Yannan Wang, Lei Xie

Figure 1 for Distance-based Weight Transfer for Fine-tuning from Near-field to Far-field Speaker Verification
Figure 2 for Distance-based Weight Transfer for Fine-tuning from Near-field to Far-field Speaker Verification
Figure 3 for Distance-based Weight Transfer for Fine-tuning from Near-field to Far-field Speaker Verification
Viaarxiv icon

ToxVis: Enabling Interpretability of Implicit vs. Explicit Toxicity Detection Models with Interactive Visualization

Mar 01, 2023
Uma Gunturi, Xiaohan Ding, Eugenia H. Rho

Figure 1 for ToxVis: Enabling Interpretability of Implicit vs. Explicit Toxicity Detection Models with Interactive Visualization
Figure 2 for ToxVis: Enabling Interpretability of Implicit vs. Explicit Toxicity Detection Models with Interactive Visualization
Viaarxiv icon

Towards Building Text-To-Speech Systems for the Next Billion Users

Add code
Bookmark button
Alert button
Nov 17, 2022
Gokul Karthik Kumar, Praveen S V, Pratyush Kumar, Mitesh M. Khapra, Karthik Nandakumar

Figure 1 for Towards Building Text-To-Speech Systems for the Next Billion Users
Figure 2 for Towards Building Text-To-Speech Systems for the Next Billion Users
Figure 3 for Towards Building Text-To-Speech Systems for the Next Billion Users
Figure 4 for Towards Building Text-To-Speech Systems for the Next Billion Users
Viaarxiv icon

Comparative layer-wise analysis of self-supervised speech models

Add code
Bookmark button
Alert button
Nov 08, 2022
Ankita Pasad, Bowen Shi, Karen Livescu

Figure 1 for Comparative layer-wise analysis of self-supervised speech models
Figure 2 for Comparative layer-wise analysis of self-supervised speech models
Figure 3 for Comparative layer-wise analysis of self-supervised speech models
Figure 4 for Comparative layer-wise analysis of self-supervised speech models
Viaarxiv icon

Diverse and Vivid Sound Generation from Text Descriptions

Add code
Bookmark button
Alert button
May 03, 2023
Guangwei Li, Xuenan Xu, Lingfeng Dai, Mengyue Wu, Kai Yu

Figure 1 for Diverse and Vivid Sound Generation from Text Descriptions
Figure 2 for Diverse and Vivid Sound Generation from Text Descriptions
Figure 3 for Diverse and Vivid Sound Generation from Text Descriptions
Figure 4 for Diverse and Vivid Sound Generation from Text Descriptions
Viaarxiv icon