Alert button

"speech": models, code, and papers
Alert button

SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis

Nov 29, 2023
Ziqiao Peng, Wentao Hu, Yue Shi, Xiangyu Zhu, Xiaomei Zhang, Hao Zhao, Jun He, Hongyan Liu, Zhaoxin Fan

Viaarxiv icon

Modeling Uncertainty in Personalized Emotion Prediction with Normalizing Flows

Dec 10, 2023
Piotr Miłkowski, Konrad Karanowski, Patryk Wielopolski, Jan Kocoń, Przemysław Kazienko, Maciej Zięba

Viaarxiv icon

Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition

Sep 19, 2023
Ziyang Ma, Wen Wu, Zhisheng Zheng, Yiwei Guo, Qian Chen, Shiliang Zhang, Xie Chen

Figure 1 for Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
Figure 2 for Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
Figure 3 for Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
Figure 4 for Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition
Viaarxiv icon

CorrTalk: Correlation Between Hierarchical Speech and Facial Activity Variances for 3D Animation

Oct 17, 2023
Zhaojie Chu, Kailing Guo, Xiaofen Xing, Yilin Lan, Bolun Cai, Xiangmin Xu

Viaarxiv icon

Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition

Oct 16, 2023
Srijith Radhakrishnan, Chao-Han Huck Yang, Sumeer Ahmad Khan, Rohit Kumar, Narsis A. Kiani, David Gomez-Cabrero, Jesper N. Tegner

Figure 1 for Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition
Figure 2 for Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition
Figure 3 for Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition
Figure 4 for Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition
Viaarxiv icon

JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions

Oct 09, 2023
Detai Xin, Junfeng Jiang, Shinnosuke Takamichi, Yuki Saito, Akiko Aizawa, Hiroshi Saruwatari

Figure 1 for JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions
Figure 2 for JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions
Figure 3 for JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions
Figure 4 for JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions
Viaarxiv icon

Collaborative Watermarking for Adversarial Speech Synthesis

Sep 26, 2023
Lauri Juvela, Xin Wang

Figure 1 for Collaborative Watermarking for Adversarial Speech Synthesis
Figure 2 for Collaborative Watermarking for Adversarial Speech Synthesis
Figure 3 for Collaborative Watermarking for Adversarial Speech Synthesis
Viaarxiv icon

Personalization of CTC-based End-to-End Speech Recognition Using Pronunciation-Driven Subword Tokenization

Oct 16, 2023
Zhihong Lei, Ernest Pusateri, Shiyi Han, Leo Liu, Mingbin Xu, Tim Ng, Ruchir Travadi, Youyuan Zhang, Mirko Hannemann, Man-Hung Siu, Zhen Huang

Viaarxiv icon

Do self-supervised speech and language models extract similar representations as human brain?

Oct 07, 2023
Peili Chen, Linyang He, Li Fu, Lu Fan, Edward F. Chang, Yuanning Li

Viaarxiv icon

Deep Beamforming for Speech Enhancement and Speaker Localization with an Array Response-Aware Loss Function

Oct 19, 2023
Hsinyu Chang, Yicheng Hsu, Mingsian R. Bai

Figure 1 for Deep Beamforming for Speech Enhancement and Speaker Localization with an Array Response-Aware Loss Function
Figure 2 for Deep Beamforming for Speech Enhancement and Speaker Localization with an Array Response-Aware Loss Function
Figure 3 for Deep Beamforming for Speech Enhancement and Speaker Localization with an Array Response-Aware Loss Function
Figure 4 for Deep Beamforming for Speech Enhancement and Speaker Localization with an Array Response-Aware Loss Function
Viaarxiv icon