Alert button

"speech": models, code, and papers
Alert button

LLaSM: Large Language and Speech Model

Add code
Bookmark button
Alert button
Aug 30, 2023
Yu Shu, Siwei Dong, Guangyao Chen, Wenhao Huang, Ruihua Zhang, Daochen Shi, Qiqi Xiang, Yemin Shi

Figure 1 for LLaSM: Large Language and Speech Model
Figure 2 for LLaSM: Large Language and Speech Model
Figure 3 for LLaSM: Large Language and Speech Model
Figure 4 for LLaSM: Large Language and Speech Model
Viaarxiv icon

Hierarchical attention interpretation: an interpretable speech-level transformer for bi-modal depression detection

Sep 23, 2023
Qingkun Deng, Saturnino Luz, Sofia de la Fuente Garcia

Figure 1 for Hierarchical attention interpretation: an interpretable speech-level transformer for bi-modal depression detection
Figure 2 for Hierarchical attention interpretation: an interpretable speech-level transformer for bi-modal depression detection
Figure 3 for Hierarchical attention interpretation: an interpretable speech-level transformer for bi-modal depression detection
Figure 4 for Hierarchical attention interpretation: an interpretable speech-level transformer for bi-modal depression detection
Viaarxiv icon

Discriminative Speech Recognition Rescoring with Pre-trained Language Models

Oct 10, 2023
Prashanth Gurunath Shivakumar, Jari Kolehmainen, Yile Gu, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko

Figure 1 for Discriminative Speech Recognition Rescoring with Pre-trained Language Models
Figure 2 for Discriminative Speech Recognition Rescoring with Pre-trained Language Models
Viaarxiv icon

How To Build Competitive Multi-gender Speech Translation Models For Controlling Speaker Gender Translation

Add code
Bookmark button
Alert button
Oct 23, 2023
Marco Gaido, Dennis Fucci, Matteo Negri, Luisa Bentivogli

Viaarxiv icon

Churn Prediction via Multimodal Fusion Learning:Integrating Customer Financial Literacy, Voice, and Behavioral Data

Dec 03, 2023
David Hason Rudd, Huan Huo, Md Rafiqul Islam, Guandong Xu

Figure 1 for Churn Prediction via Multimodal Fusion Learning:Integrating Customer Financial Literacy, Voice, and Behavioral Data
Figure 2 for Churn Prediction via Multimodal Fusion Learning:Integrating Customer Financial Literacy, Voice, and Behavioral Data
Figure 3 for Churn Prediction via Multimodal Fusion Learning:Integrating Customer Financial Literacy, Voice, and Behavioral Data
Figure 4 for Churn Prediction via Multimodal Fusion Learning:Integrating Customer Financial Literacy, Voice, and Behavioral Data
Viaarxiv icon

Quantifying the redundancy between prosody and text

Nov 28, 2023
Lukas Wolf, Tiago Pimentel, Evelina Fedorenko, Ryan Cotterell, Alex Warstadt, Ethan Wilcox, Tamar Regev

Viaarxiv icon

VoxArabica: A Robust Dialect-Aware Arabic Speech Recognition System

Oct 17, 2023
Abdul Waheed, Bashar Talafha, Peter Suvellin, Abdelrahman Elmadney, Muhammad Abdul-Mageed

Figure 1 for VoxArabica: A Robust Dialect-Aware Arabic Speech Recognition System
Figure 2 for VoxArabica: A Robust Dialect-Aware Arabic Speech Recognition System
Figure 3 for VoxArabica: A Robust Dialect-Aware Arabic Speech Recognition System
Figure 4 for VoxArabica: A Robust Dialect-Aware Arabic Speech Recognition System
Viaarxiv icon

Efficient Multi-Channel Speech Enhancement with Spherical Harmonics Injection for Directional Encoding

Sep 19, 2023
Jiahui Pan, Pengjie Shen, Hui Zhang, Xueliang Zhang

Figure 1 for Efficient Multi-Channel Speech Enhancement with Spherical Harmonics Injection for Directional Encoding
Figure 2 for Efficient Multi-Channel Speech Enhancement with Spherical Harmonics Injection for Directional Encoding
Figure 3 for Efficient Multi-Channel Speech Enhancement with Spherical Harmonics Injection for Directional Encoding
Figure 4 for Efficient Multi-Channel Speech Enhancement with Spherical Harmonics Injection for Directional Encoding
Viaarxiv icon

GRASS: Unified Generation Model for Speech Semantic Understanding

Add code
Bookmark button
Alert button
Sep 06, 2023
Aobo Xia, Shuyu Lei, Yushu Yang, Xiang Guo, Hua Chai

Figure 1 for GRASS: Unified Generation Model for Speech Semantic Understanding
Figure 2 for GRASS: Unified Generation Model for Speech Semantic Understanding
Figure 3 for GRASS: Unified Generation Model for Speech Semantic Understanding
Viaarxiv icon

Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting

Sep 15, 2023
Tiantian Feng, Shrikanth Narayanan

Figure 1 for Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting
Figure 2 for Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting
Figure 3 for Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting
Figure 4 for Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting
Viaarxiv icon