Alert button

"speech": models, code, and papers
Alert button

Teach me with a Whisper: Enhancing Large Language Models for Analyzing Spoken Transcripts using Speech Embeddings

Nov 13, 2023
Fatema Hasan, Yulong Li, James Foulds, Shimei Pan, Bishwaranjan Bhattacharjee

Viaarxiv icon

Exploring Speech Enhancement for Low-resource Speech Synthesis

Add code
Bookmark button
Alert button
Sep 19, 2023
Zhaoheng Ni, Sravya Popuri, Ning Dong, Kohei Saijo, Xiaohui Zhang, Gael Le Lan, Yangyang Shi, Vikas Chandra, Changhan Wang

Figure 1 for Exploring Speech Enhancement for Low-resource Speech Synthesis
Figure 2 for Exploring Speech Enhancement for Low-resource Speech Synthesis
Figure 3 for Exploring Speech Enhancement for Low-resource Speech Synthesis
Figure 4 for Exploring Speech Enhancement for Low-resource Speech Synthesis
Viaarxiv icon

Dementia Assessment Using Mandarin Speech with an Attention-based Speech Recognition Encoder

Add code
Bookmark button
Alert button
Oct 06, 2023
Zih-Jyun Lin, Yi-Ju Chen, Po-Chih Kuo, Likai Huang, Chaur-Jong Hu, Cheng-Yu Chen

Figure 1 for Dementia Assessment Using Mandarin Speech with an Attention-based Speech Recognition Encoder
Figure 2 for Dementia Assessment Using Mandarin Speech with an Attention-based Speech Recognition Encoder
Figure 3 for Dementia Assessment Using Mandarin Speech with an Attention-based Speech Recognition Encoder
Figure 4 for Dementia Assessment Using Mandarin Speech with an Attention-based Speech Recognition Encoder
Viaarxiv icon

THOS: A Benchmark Dataset for Targeted Hate and Offensive Speech

Nov 11, 2023
Saad Almohaimeed, Saleh Almohaimeed, Ashfaq Ali Shafin, Bogdan Carbunar, Ladislau Bölöni

Viaarxiv icon

Towards Streaming Speech-to-Avatar Synthesis

Oct 25, 2023
Tejas S. Prabhune, Peter Wu, Bohan Yu, Gopala K. Anumanchipalli

Figure 1 for Towards Streaming Speech-to-Avatar Synthesis
Figure 2 for Towards Streaming Speech-to-Avatar Synthesis
Figure 3 for Towards Streaming Speech-to-Avatar Synthesis
Figure 4 for Towards Streaming Speech-to-Avatar Synthesis
Viaarxiv icon

Multi-Speaker Expressive Speech Synthesis via Semi-supervised Contrastive Learning

Add code
Bookmark button
Alert button
Oct 26, 2023
Xinfa Zhu, Yuke Li, Yi Lei, Ning Jiang, Guoqing Zhao, Lei Xie

Viaarxiv icon

Towards Probing Contact Center Large Language Models

Dec 26, 2023
Varun Nathan, Ayush Kumar, Digvijay Ingle, Jithendra Vepa

Viaarxiv icon

Investigating Zero-Shot Generalizability on Mandarin-English Code-Switched ASR and Speech-to-text Translation of Recent Foundation Models with Self-Supervision and Weak Supervision

Dec 30, 2023
Chih-Kai Yang, Kuan-Po Huang, Ke-Han Lu, Chun-Yi Kuan, Chi-Yuan Hsiao, Hung-yi Lee

Viaarxiv icon

Realistic Speech-to-Face Generation with Speech-Conditioned Latent Diffusion Model with Face Prior

Oct 05, 2023
Jinting Wang, Li Liu, Jun Wang, Hei Victor Cheng

Figure 1 for Realistic Speech-to-Face Generation with Speech-Conditioned Latent Diffusion Model with Face Prior
Figure 2 for Realistic Speech-to-Face Generation with Speech-Conditioned Latent Diffusion Model with Face Prior
Figure 3 for Realistic Speech-to-Face Generation with Speech-Conditioned Latent Diffusion Model with Face Prior
Figure 4 for Realistic Speech-to-Face Generation with Speech-Conditioned Latent Diffusion Model with Face Prior
Viaarxiv icon

DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors

Dec 22, 2023
Federico Landini, Mireia Diez, Themos Stafylakis, Lukáš Burget

Viaarxiv icon