Alert button

"speech": models, code, and papers
Alert button

SD-HuBERT: Self-Distillation Induces Syllabic Organization in HuBERT

Add code
Bookmark button
Alert button
Oct 16, 2023
Cheol Jun Cho, Abdelrahman Mohamed, Shang-Wen Li, Alan W Black, Gopala K. Anumanchipalli

Figure 1 for SD-HuBERT: Self-Distillation Induces Syllabic Organization in HuBERT
Figure 2 for SD-HuBERT: Self-Distillation Induces Syllabic Organization in HuBERT
Figure 3 for SD-HuBERT: Self-Distillation Induces Syllabic Organization in HuBERT
Figure 4 for SD-HuBERT: Self-Distillation Induces Syllabic Organization in HuBERT
Viaarxiv icon

The Mason-Alberta Phonetic Segmenter: A forced alignment system based on deep neural networks and interpolation

Add code
Bookmark button
Alert button
Oct 24, 2023
Matthew C. Kelley, Scott James Perry, Benjamin V. Tucker

Viaarxiv icon

MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023

Sep 12, 2023
Zhihang Xu, Shaofei Zhang, Xi Wang, Jiajun Zhang, Wenning Wei, Lei He, Sheng Zhao

Figure 1 for MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023
Figure 2 for MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023
Figure 3 for MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023
Figure 4 for MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023
Viaarxiv icon

A Survey on Online User Aggression: Content Detection and Behavioural Analysis on Social Media Platforms

Nov 15, 2023
Swapnil Mane, Suman Kundu, Rajesh Sharma

Viaarxiv icon

Investigating the Emergent Audio Classification Ability of ASR Foundation Models

Nov 15, 2023
Rao Ma, Adian Liusie, Mark J. F. Gales, Kate M. Knill

Figure 1 for Investigating the Emergent Audio Classification Ability of ASR Foundation Models
Figure 2 for Investigating the Emergent Audio Classification Ability of ASR Foundation Models
Figure 3 for Investigating the Emergent Audio Classification Ability of ASR Foundation Models
Figure 4 for Investigating the Emergent Audio Classification Ability of ASR Foundation Models
Viaarxiv icon

SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis

Aug 02, 2023
Ramanan Sivaguru, Vasista Sai Lodagala, S Umesh

Figure 1 for SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis
Figure 2 for SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis
Viaarxiv icon

Unimodal Aggregation for CTC-based Speech Recognition

Add code
Bookmark button
Alert button
Sep 15, 2023
Ying Fang, Xiaofei Li

Figure 1 for Unimodal Aggregation for CTC-based Speech Recognition
Figure 2 for Unimodal Aggregation for CTC-based Speech Recognition
Figure 3 for Unimodal Aggregation for CTC-based Speech Recognition
Figure 4 for Unimodal Aggregation for CTC-based Speech Recognition
Viaarxiv icon

Whisper-MCE: Whisper Model Finetuned for Better Performance with Mixed Languages

Oct 27, 2023
Peng Xie, XingYuan Liu, ZiWei Chen, Kani Chen, Yang Wang

Figure 1 for Whisper-MCE: Whisper Model Finetuned for Better Performance with Mixed Languages
Figure 2 for Whisper-MCE: Whisper Model Finetuned for Better Performance with Mixed Languages
Viaarxiv icon

SPGM: Prioritizing Local Features for enhanced speech separation performance

Add code
Bookmark button
Alert button
Sep 22, 2023
Jia Qi Yip, Shengkui Zhao, Yukun Ma, Chongjia Ni, Chong Zhang, Hao Wang, Trung Hieu Nguyen, Kun Zhou, Dianwen Ng, Eng Siong Chng, Bin Ma

Figure 1 for SPGM: Prioritizing Local Features for enhanced speech separation performance
Figure 2 for SPGM: Prioritizing Local Features for enhanced speech separation performance
Figure 3 for SPGM: Prioritizing Local Features for enhanced speech separation performance
Figure 4 for SPGM: Prioritizing Local Features for enhanced speech separation performance
Viaarxiv icon

Diversity-based core-set selection for text-to-speech with linguistic and acoustic features

Sep 15, 2023
Kentaro Seki, Shinnosuke Takamichi, Takaaki Saeki, Hiroshi Saruwatari

Figure 1 for Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Figure 2 for Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Figure 3 for Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Figure 4 for Diversity-based core-set selection for text-to-speech with linguistic and acoustic features
Viaarxiv icon