Alert button

"speech": models, code, and papers
Alert button

CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis

Aug 30, 2023
Yi Meng, Xiang Li, Zhiyong Wu, Tingtian Li, Zixun Sun, Xinyu Xiao, Chi Sun, Hui Zhan, Helen Meng

Figure 1 for CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis
Figure 2 for CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis
Figure 3 for CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis
Figure 4 for CALM: Contrastive Cross-modal Speaking Style Modeling for Expressive Text-to-Speech Synthesis
Viaarxiv icon

Streaming Anchor Loss: Augmenting Supervision with Temporal Significance

Oct 09, 2023
Utkarsh, Sarawgi, John Berkowitz, Vineet Garg, Arnav Kundu, Minsik Cho, Sai Srujana Buddi, Saurabh Adya, Ahmed Tewfik

Figure 1 for Streaming Anchor Loss: Augmenting Supervision with Temporal Significance
Figure 2 for Streaming Anchor Loss: Augmenting Supervision with Temporal Significance
Figure 3 for Streaming Anchor Loss: Augmenting Supervision with Temporal Significance
Figure 4 for Streaming Anchor Loss: Augmenting Supervision with Temporal Significance
Viaarxiv icon

Powerset multi-class cross entropy loss for neural speaker diarization

Oct 19, 2023
Alexis Plaquet, Hervé Bredin

Viaarxiv icon

Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition

Sep 19, 2023
Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi

Figure 1 for Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition
Figure 2 for Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition
Figure 3 for Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition
Figure 4 for Harnessing the Zero-Shot Power of Instruction-Tuned Large Language Model in End-to-End Speech Recognition
Viaarxiv icon

Quantifying the Dialect Gap and its Correlates Across Languages

Oct 23, 2023
Anjali Kantharuban, Ivan Vulić, Anna Korhonen

Viaarxiv icon

InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations

Oct 23, 2023
Nils Feldhus, Qianli Wang, Tatiana Anikina, Sahil Chopra, Cennet Oguz, Sebastian Möller

Figure 1 for InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations
Figure 2 for InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations
Figure 3 for InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations
Figure 4 for InterroLang: Exploring NLP Models and Datasets through Dialogue-based Explanations
Viaarxiv icon

Delayed Memory Unit: Modelling Temporal Dependency Through Delay Gate

Oct 23, 2023
Pengfei Sun, Jibin Wu, Malu Zhang, Paul Devos, Dick Botteldooren

Viaarxiv icon

EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios

Oct 05, 2023
Tejes Srivastava, Jiatong Shi, William Chen, Shinji Watanabe

Figure 1 for EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios
Figure 2 for EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios
Figure 3 for EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios
Figure 4 for EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios
Viaarxiv icon

HiGNN-TTS: Hierarchical Prosody Modeling with Graph Neural Networks for Expressive Long-form TTS

Sep 25, 2023
Dake Guo, Xinfa Zhu, Liumeng Xue, Tao Li, Yuanjun Lv, Yuepeng Jiang, Lei Xie

Viaarxiv icon

MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice Enhancement

Oct 06, 2023
Weiming Xu, Zhouxuan Chen, Zhili Tan, Shubo Lv, Runduo Han, Wenjiang Zhou, Weifeng Zhao, Lei Xie

Figure 1 for MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice Enhancement
Figure 2 for MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice Enhancement
Figure 3 for MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice Enhancement
Figure 4 for MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice Enhancement
Viaarxiv icon