Alert button

"speech": models, code, and papers
Alert button

Speech Emotion Diarization: Which Emotion Appears When?

Add code
Bookmark button
Alert button
Jun 22, 2023
Yingzhi Wang, Mirco Ravanelli, Alaa Nfissi, Alya Yacoubi

Figure 1 for Speech Emotion Diarization: Which Emotion Appears When?
Figure 2 for Speech Emotion Diarization: Which Emotion Appears When?
Figure 3 for Speech Emotion Diarization: Which Emotion Appears When?
Figure 4 for Speech Emotion Diarization: Which Emotion Appears When?
Viaarxiv icon

Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP

Add code
Bookmark button
Alert button
Sep 11, 2023
Jinzuomu Zhong, Yang Li, Hui Huang, Jie Liu, Zhiba Su, Jing Guo, Benlai Tang, Fengjie Zhu

Figure 1 for Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP
Figure 2 for Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP
Figure 3 for Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP
Figure 4 for Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP
Viaarxiv icon

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation

Add code
Bookmark button
Alert button
Jul 23, 2023
Yoshiki Masuyama, Xuankai Chang, Wangyou Zhang, Samuele Cornell, Zhong-Qiu Wang, Nobutaka Ono, Yanmin Qian, Shinji Watanabe

Figure 1 for Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation
Figure 2 for Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation
Figure 3 for Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation
Viaarxiv icon

Prompting Large Language Models with Speech Recognition Abilities

Jul 21, 2023
Yassir Fathullah, Chunyang Wu, Egor Lakomkin, Junteng Jia, Yuan Shangguan, Ke Li, Jinxi Guo, Wenhan Xiong, Jay Mahadeokar, Ozlem Kalinli, Christian Fuegen, Mike Seltzer

Figure 1 for Prompting Large Language Models with Speech Recognition Abilities
Figure 2 for Prompting Large Language Models with Speech Recognition Abilities
Figure 3 for Prompting Large Language Models with Speech Recognition Abilities
Figure 4 for Prompting Large Language Models with Speech Recognition Abilities
Viaarxiv icon

Audio-AdapterFusion: A Task-ID-free Approach for Efficient and Non-Destructive Multi-task Speech Recognition

Oct 17, 2023
Hillary Ngai, Rohan Agrawal, Neeraj Gaur, Ronny Huang, Parisa Haghani, Pedro Moreno Mengibar

Viaarxiv icon

HowToCaption: Prompting LLMs to Transform Video Annotations at Scale

Add code
Bookmark button
Alert button
Oct 07, 2023
Nina Shvetsova, Anna Kukleva, Xudong Hong, Christian Rupprecht, Bernt Schiele, Hilde Kuehne

Figure 1 for HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Figure 2 for HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Figure 3 for HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Figure 4 for HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Viaarxiv icon

Unintended Memorization in Large ASR Models, and How to Mitigate It

Oct 18, 2023
Lun Wang, Om Thakkar, Rajiv Mathews

Viaarxiv icon

Improving Short Utterance Anti-Spoofing with AASIST2

Add code
Bookmark button
Alert button
Sep 15, 2023
Yuxiang Zhang, Jingze Lu, Zengqiang Shang, Wenchao Wang, Pengyuan Zhang

Figure 1 for Improving Short Utterance Anti-Spoofing with AASIST2
Figure 2 for Improving Short Utterance Anti-Spoofing with AASIST2
Figure 3 for Improving Short Utterance Anti-Spoofing with AASIST2
Figure 4 for Improving Short Utterance Anti-Spoofing with AASIST2
Viaarxiv icon

CQNV: A combination of coarsely quantized bitstream and neural vocoder for low rate speech coding

Add code
Bookmark button
Alert button
Jul 25, 2023
Youqiang Zheng, Li Xiao, Weiping Tu, Yuhong Yang, Xinmeng Xu

Figure 1 for CQNV: A combination of coarsely quantized bitstream and neural vocoder for low rate speech coding
Figure 2 for CQNV: A combination of coarsely quantized bitstream and neural vocoder for low rate speech coding
Figure 3 for CQNV: A combination of coarsely quantized bitstream and neural vocoder for low rate speech coding
Figure 4 for CQNV: A combination of coarsely quantized bitstream and neural vocoder for low rate speech coding
Viaarxiv icon

Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation

Add code
Bookmark button
Alert button
Sep 19, 2023
Luyao Cheng, Siqi Zheng, Qinglin Zhang, Hui Wang, Yafeng Chen, Qian Chen, Shiliang Zhang

Figure 1 for Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
Figure 2 for Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
Figure 3 for Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
Figure 4 for Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
Viaarxiv icon