Alert button

"speech": models, code, and papers
Alert button

Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction

Jan 31, 2024
Xueyuan Chen, Yuejiao Wang, Xixin Wu, Disong Wang, Zhiyong Wu, Xunying Liu, Helen Meng

Viaarxiv icon

Towards Event Extraction from Speech with Contextual Clues

Add code
Bookmark button
Alert button
Jan 27, 2024
Jingqi Kang, Tongtong Wu, Jinming Zhao, Guitao Wang, Guilin Qi, Yuan-Fang Li, Gholamreza Haffari

Viaarxiv icon

PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a Diffusion Probabilistic Model

Add code
Bookmark button
Alert button
Feb 22, 2024
Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda

Viaarxiv icon

On combining acoustic and modulation spectrograms in an attention LSTM-based system for speech intelligibility level classification

Feb 05, 2024
Ascensión Gallardo-Antolín, Juan M. Montero

Viaarxiv icon

Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0

Feb 27, 2024
Taein Kang, Soyul Han, Sunmook Choi, Jaejin Seo, Sanghyeok Chung, Seungeun Lee, Seungsang Oh, Il-Youp Kwak

Viaarxiv icon

Can Interpretability Layouts Influence Human Perception of Offensive Sentences?

Add code
Bookmark button
Alert button
Mar 01, 2024
Thiago Freitas dos Santos, Nardine Osman, Marco Schorlemmer

Figure 1 for Can Interpretability Layouts Influence Human Perception of Offensive Sentences?
Figure 2 for Can Interpretability Layouts Influence Human Perception of Offensive Sentences?
Figure 3 for Can Interpretability Layouts Influence Human Perception of Offensive Sentences?
Figure 4 for Can Interpretability Layouts Influence Human Perception of Offensive Sentences?
Viaarxiv icon

Mixture to Mixture: Leveraging Close-talk Mixtures as Weak-supervision for Speech Separation

Feb 14, 2024
Zhong-Qiu Wang

Viaarxiv icon

FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio

Add code
Bookmark button
Alert button
Mar 04, 2024
Chao Xu, Yang Liu, Jiazheng Xing, Weida Wang, Mingze Sun, Jun Dan, Tianxin Huang, Siyuan Li, Zhi-Qi Cheng, Ying Tai, Baigui Sun

Figure 1 for FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio
Figure 2 for FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio
Figure 3 for FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio
Figure 4 for FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio
Viaarxiv icon

Efficient data selection employing Semantic Similarity-based Graph Structures for model training

Feb 22, 2024
Roxana Petcu, Subhadeep Maji

Viaarxiv icon

Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition

Feb 04, 2024
Alexandra Saliba, Yuanchao Li, Ramon Sanabria, Catherine Lai

Viaarxiv icon