Alert button

"speech": models, code, and papers
Alert button

Svarah: Evaluating English ASR Systems on Indian Accents

May 25, 2023
Tahir Javed, Sakshi Joshi, Vignesh Nagarajan, Sai Sundaresan, Janki Nawale, Abhigyan Raman, Kaushal Bhogale, Pratyush Kumar, Mitesh M. Khapra

Figure 1 for Svarah: Evaluating English ASR Systems on Indian Accents
Figure 2 for Svarah: Evaluating English ASR Systems on Indian Accents
Figure 3 for Svarah: Evaluating English ASR Systems on Indian Accents
Figure 4 for Svarah: Evaluating English ASR Systems on Indian Accents
Viaarxiv icon

The Ethical Implications of Generative Audio Models: A Systematic Literature Review

Jul 07, 2023
Julia Barnett

Viaarxiv icon

How Good is Automatic Segmentation as a Multimodal Discourse Annotation Aid?

May 27, 2023
Corbyn Terpstra, Ibrahim Khebour, Mariah Bradford, Brett Wisniewski, Nikhil Krishnaswamy, Nathaniel Blanchard

Figure 1 for How Good is Automatic Segmentation as a Multimodal Discourse Annotation Aid?
Figure 2 for How Good is Automatic Segmentation as a Multimodal Discourse Annotation Aid?
Figure 3 for How Good is Automatic Segmentation as a Multimodal Discourse Annotation Aid?
Viaarxiv icon

Dialog act guided contextual adapter for personalized speech recognition

Mar 31, 2023
Feng-Ju Chang, Thejaswi Muniyappa, Kanthashree Mysore Sathyendra, Kai Wei, Grant P. Strimel, Ross McGowan

Figure 1 for Dialog act guided contextual adapter for personalized speech recognition
Figure 2 for Dialog act guided contextual adapter for personalized speech recognition
Figure 3 for Dialog act guided contextual adapter for personalized speech recognition
Figure 4 for Dialog act guided contextual adapter for personalized speech recognition
Viaarxiv icon

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Add code
Bookmark button
Alert button
Jun 06, 2023
Yizhi Li, Ruibin Yuan, Ge Zhang, Yinghao Ma, Xingran Chen, Hanzhi Yin, Chenghua Lin, Anton Ragni, Emmanouil Benetos, Norbert Gyenge, Roger Dannenberg, Ruibo Liu, Wenhu Chen, Gus Xia, Yemin Shi, Wenhao Huang, Yike Guo, Jie Fu

Figure 1 for MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Figure 2 for MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Figure 3 for MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Figure 4 for MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Viaarxiv icon

HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models

Add code
Bookmark button
Alert button
Jun 12, 2023
Ji-Sang Hwang, Sang-Hoon Lee, Seong-Whan Lee

Figure 1 for HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models
Figure 2 for HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models
Figure 3 for HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models
Figure 4 for HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models
Viaarxiv icon

Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation

Add code
Bookmark button
Alert button
Feb 11, 2023
Cong Han, Vishal Choudhari, Yinghao Aaron Li, Nima Mesgarani

Figure 1 for Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation
Figure 2 for Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation
Figure 3 for Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation
Figure 4 for Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation
Viaarxiv icon

The NPU-Elevoc Personalized Speech Enhancement System for ICASSP2023 DNS Challenge

Add code
Bookmark button
Alert button
Mar 15, 2023
Xiaopeng Yan, Yindi Yang, Zhihao Guo, Liangliang Peng, Lei Xie

Figure 1 for The NPU-Elevoc Personalized Speech Enhancement System for ICASSP2023 DNS Challenge
Figure 2 for The NPU-Elevoc Personalized Speech Enhancement System for ICASSP2023 DNS Challenge
Viaarxiv icon

Multi-level Temporal-channel Speaker Retrieval for Robust Zero-shot Voice Conversion

Add code
Bookmark button
Alert button
May 12, 2023
Zhichao Wang, Liumeng Xue, Qiuqiang Kong, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang

Figure 1 for Multi-level Temporal-channel Speaker Retrieval for Robust Zero-shot Voice Conversion
Figure 2 for Multi-level Temporal-channel Speaker Retrieval for Robust Zero-shot Voice Conversion
Figure 3 for Multi-level Temporal-channel Speaker Retrieval for Robust Zero-shot Voice Conversion
Figure 4 for Multi-level Temporal-channel Speaker Retrieval for Robust Zero-shot Voice Conversion
Viaarxiv icon

SLABERT Talk Pretty One Day: Modeling Second Language Acquisition with BERT

Add code
Bookmark button
Alert button
May 31, 2023
Aditya Yadavalli, Alekhya Yadavalli, Vera Tobin

Figure 1 for SLABERT Talk Pretty One Day: Modeling Second Language Acquisition with BERT
Figure 2 for SLABERT Talk Pretty One Day: Modeling Second Language Acquisition with BERT
Figure 3 for SLABERT Talk Pretty One Day: Modeling Second Language Acquisition with BERT
Figure 4 for SLABERT Talk Pretty One Day: Modeling Second Language Acquisition with BERT
Viaarxiv icon