Alert button

"speech": models, code, and papers
Alert button

Spatial Diarization for Meeting Transcription with Ad-Hoc Acoustic Sensor Networks

Nov 27, 2023
Tobias Gburrek, Joerg Schmalenstroeer, Reinhold Haeb-Umbach

Figure 1 for Spatial Diarization for Meeting Transcription with Ad-Hoc Acoustic Sensor Networks
Figure 2 for Spatial Diarization for Meeting Transcription with Ad-Hoc Acoustic Sensor Networks
Figure 3 for Spatial Diarization for Meeting Transcription with Ad-Hoc Acoustic Sensor Networks
Viaarxiv icon

AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement

Sep 14, 2023
Ju-Chieh Chou, Chung-Ming Chien, Karen Livescu

Figure 1 for AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement
Figure 2 for AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement
Figure 3 for AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement
Figure 4 for AV2Wav: Diffusion-Based Re-synthesis from Continuous Self-supervised Features for Audio-Visual Speech Enhancement
Viaarxiv icon

A-JEPA: Joint-Embedding Predictive Architecture Can Listen

Nov 28, 2023
Zhengcong Fei, Mingyuan Fan, Junshi Huang

Viaarxiv icon

Decoding Emotions: A comprehensive Multilingual Study of Speech Models for Speech Emotion Recognition

Add code
Bookmark button
Alert button
Aug 17, 2023
Anant Singh, Akshat Gupta

Figure 1 for Decoding Emotions: A comprehensive Multilingual Study of Speech Models for Speech Emotion Recognition
Figure 2 for Decoding Emotions: A comprehensive Multilingual Study of Speech Models for Speech Emotion Recognition
Figure 3 for Decoding Emotions: A comprehensive Multilingual Study of Speech Models for Speech Emotion Recognition
Figure 4 for Decoding Emotions: A comprehensive Multilingual Study of Speech Models for Speech Emotion Recognition
Viaarxiv icon

Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions

Add code
Bookmark button
Alert button
Sep 16, 2023
Heming Wang, Meng Yu, Hao Zhang, Chunlei Zhang, Zhongweiyang Xu, Muqiao Yang, Yixuan Zhang, Dong Yu

Figure 1 for Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions
Figure 2 for Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions
Figure 3 for Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions
Figure 4 for Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions
Viaarxiv icon

Effect of Attention and Self-Supervised Speech Embeddings on Non-Semantic Speech Tasks

Add code
Bookmark button
Alert button
Aug 28, 2023
Payal Mohapatra, Akash Pandey, Yueyuan Sui, Qi Zhu

Viaarxiv icon

Evaluating Self-Supervised Speech Representations for Indigenous American Languages

Oct 08, 2023
Chih-Chen Chen, William Chen, Rodolfo Zevallos, John E. Ortega

Viaarxiv icon

VaSAB: The variable size adaptive information bottleneck for disentanglement on speech and singing voice

Oct 05, 2023
Frederik Bous, Axel Roebel

Figure 1 for VaSAB: The variable size adaptive information bottleneck for disentanglement on speech and singing voice
Figure 2 for VaSAB: The variable size adaptive information bottleneck for disentanglement on speech and singing voice
Viaarxiv icon

Soft Random Sampling: A Theoretical and Empirical Analysis

Nov 24, 2023
Xiaodong Cui, Ashish Mittal, Songtao Lu, Wei Zhang, George Saon, Brian Kingsbury

Viaarxiv icon

Rep2wav: Noise Robust text-to-speech Using self-supervised representations

Add code
Bookmark button
Alert button
Sep 04, 2023
Qiushi Zhu, Yu Gu, Rilin Chen, Chao Weng, Yuchen Hu, Lirong Dai, Jie Zhang

Figure 1 for Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Figure 2 for Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Figure 3 for Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Figure 4 for Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Viaarxiv icon