Alert button

"speech": models, code, and papers
Alert button

Audio Diffusion Model for Speech Synthesis: A Survey on Text To Speech and Speech Enhancement in Generative AI

Add code
Bookmark button
Alert button
Mar 23, 2023
Chenshuang Zhang, Chaoning Zhang, Sheng Zheng, Mengchun Zhang, Maryam Qamar, Sung-Ho Bae, In So Kweon

Figure 1 for Audio Diffusion Model for Speech Synthesis: A Survey on Text To Speech and Speech Enhancement in Generative AI
Figure 2 for Audio Diffusion Model for Speech Synthesis: A Survey on Text To Speech and Speech Enhancement in Generative AI
Figure 3 for Audio Diffusion Model for Speech Synthesis: A Survey on Text To Speech and Speech Enhancement in Generative AI
Figure 4 for Audio Diffusion Model for Speech Synthesis: A Survey on Text To Speech and Speech Enhancement in Generative AI
Viaarxiv icon

TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition

Add code
Bookmark button
Alert button
May 18, 2023
Tiantian Feng, Rajat Hebbar, Shrikanth Narayanan

Figure 1 for TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition
Figure 2 for TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition
Figure 3 for TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition
Figure 4 for TrustSER: On the Trustworthiness of Fine-tuning Pre-trained Speech Embeddings For Speech Emotion Recognition
Viaarxiv icon

Iteratively Improving Speech Recognition and Voice Conversion

Add code
Bookmark button
Alert button
May 24, 2023
Mayank Kumar Singh, Naoya Takahashi, Onoe Naoyuki

Figure 1 for Iteratively Improving Speech Recognition and Voice Conversion
Figure 2 for Iteratively Improving Speech Recognition and Voice Conversion
Figure 3 for Iteratively Improving Speech Recognition and Voice Conversion
Figure 4 for Iteratively Improving Speech Recognition and Voice Conversion
Viaarxiv icon

Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction

May 21, 2023
Mohan Shi, Yuchun Shu, Lingyun Zuo, Qian Chen, Shiliang Zhang, Jie Zhang, Li-Rong Dai

Figure 1 for Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction
Figure 2 for Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction
Figure 3 for Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction
Figure 4 for Semantic VAD: Low-Latency Voice Activity Detection for Speech Interaction
Viaarxiv icon

Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech Signals

Add code
Bookmark button
Alert button
Jun 06, 2023
Jinhan Wang, Vijay Ravi, Abeer Alwan

Figure 1 for Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech Signals
Figure 2 for Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech Signals
Figure 3 for Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech Signals
Figure 4 for Non-uniform Speaker Disentanglement For Depression Detection From Raw Speech Signals
Viaarxiv icon

ViLaS: Integrating Vision and Language into Automatic Speech Recognition

May 31, 2023
Minglun Han, Feilong Chen, Ziyi Ni, Linghui Meng, Jing Shi, Shuang Xu, Bo Xu

Figure 1 for ViLaS: Integrating Vision and Language into Automatic Speech Recognition
Figure 2 for ViLaS: Integrating Vision and Language into Automatic Speech Recognition
Figure 3 for ViLaS: Integrating Vision and Language into Automatic Speech Recognition
Figure 4 for ViLaS: Integrating Vision and Language into Automatic Speech Recognition
Viaarxiv icon

Active Learning for Classifying 2D Grid-Based Level Completability

Add code
Bookmark button
Alert button
Sep 08, 2023
Mahsa Bazzaz, Seth Cooper

Figure 1 for Active Learning for Classifying 2D Grid-Based Level Completability
Figure 2 for Active Learning for Classifying 2D Grid-Based Level Completability
Figure 3 for Active Learning for Classifying 2D Grid-Based Level Completability
Figure 4 for Active Learning for Classifying 2D Grid-Based Level Completability
Viaarxiv icon

Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator

Add code
Bookmark button
Alert button
May 25, 2023
Lingwei Meng, Jiawen Kang, Mingyu Cui, Haibin Wu, Xixin Wu, Helen Meng

Figure 1 for Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator
Figure 2 for Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator
Figure 3 for Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator
Figure 4 for Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator
Viaarxiv icon

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction

Add code
Bookmark button
Alert button
Aug 19, 2023
Jinchuan Tian, Jianwei Yu, Hangting Chen, Brian Yan, Chao Weng, Dong Yu, Shinji Watanabe

Figure 1 for Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Figure 2 for Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Figure 3 for Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Figure 4 for Bayes Risk Transducer: Transducer with Controllable Alignment Prediction
Viaarxiv icon

LibriSQA: Advancing Free-form and Open-ended Spoken Question Answering with a Novel Dataset and Framework

Add code
Bookmark button
Alert button
Aug 22, 2023
Zihan Zhao, Yiyang Jiang, Heyang Liu, Yanfeng Wang, Yu Wang

Viaarxiv icon