Alert button

"speech": models, code, and papers
Alert button

Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model

Add code
Bookmark button
Alert button
May 31, 2023
Héctor Martel, Julius Richter, Kai Li, Xiaolin Hu, Timo Gerkmann

Figure 1 for Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model
Figure 2 for Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model
Figure 3 for Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model
Figure 4 for Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model
Viaarxiv icon

Emotions Beyond Words: Non-Speech Audio Emotion Recognition With Edge Computing

May 01, 2023
Ibrahim Malik, Siddique Latif, Sanaullah Manzoor, Muhammad Usama, Junaid Qadir, Raja Jurdak

Figure 1 for Emotions Beyond Words: Non-Speech Audio Emotion Recognition With Edge Computing
Figure 2 for Emotions Beyond Words: Non-Speech Audio Emotion Recognition With Edge Computing
Figure 3 for Emotions Beyond Words: Non-Speech Audio Emotion Recognition With Edge Computing
Figure 4 for Emotions Beyond Words: Non-Speech Audio Emotion Recognition With Edge Computing
Viaarxiv icon

Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement

Jun 14, 2023
Zilu Guo, Jun Du, Chin-Hui Lee, Yu Gao, Wenbin Zhang

Figure 1 for Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement
Figure 2 for Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement
Figure 3 for Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement
Figure 4 for Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement
Viaarxiv icon

iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN

Add code
Bookmark button
Alert button
Aug 14, 2023
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Shogo Seki

Figure 1 for iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN
Figure 2 for iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN
Figure 3 for iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN
Figure 4 for iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN
Viaarxiv icon

Cascaded Cross-Modal Transformer for Request and Complaint Detection

Jul 27, 2023
Nicolae-Catalin Ristea, Radu Tudor Ionescu

Figure 1 for Cascaded Cross-Modal Transformer for Request and Complaint Detection
Figure 2 for Cascaded Cross-Modal Transformer for Request and Complaint Detection
Figure 3 for Cascaded Cross-Modal Transformer for Request and Complaint Detection
Figure 4 for Cascaded Cross-Modal Transformer for Request and Complaint Detection
Viaarxiv icon

Employing Hybrid Deep Neural Networks on Dari Speech

May 04, 2023
Jawid Ahmad Baktash, Mursal Dawodi

Figure 1 for Employing Hybrid Deep Neural Networks on Dari Speech
Figure 2 for Employing Hybrid Deep Neural Networks on Dari Speech
Figure 3 for Employing Hybrid Deep Neural Networks on Dari Speech
Figure 4 for Employing Hybrid Deep Neural Networks on Dari Speech
Viaarxiv icon

DiscoverPath: A Knowledge Refinement and Retrieval System for Interdisciplinarity on Biomedical Research

Add code
Bookmark button
Alert button
Sep 04, 2023
Yu-Neng Chuang, Guanchu Wang, Chia-Yuan Chang, Kwei-Herng Lai, Daochen Zha, Ruixiang Tang, Fan Yang, Alfredo Costilla Reyes, Kaixiong Zhou, Xiaoqian Jiang, Xia Hu

Figure 1 for DiscoverPath: A Knowledge Refinement and Retrieval System for Interdisciplinarity on Biomedical Research
Figure 2 for DiscoverPath: A Knowledge Refinement and Retrieval System for Interdisciplinarity on Biomedical Research
Figure 3 for DiscoverPath: A Knowledge Refinement and Retrieval System for Interdisciplinarity on Biomedical Research
Figure 4 for DiscoverPath: A Knowledge Refinement and Retrieval System for Interdisciplinarity on Biomedical Research
Viaarxiv icon

CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-Training

Add code
Bookmark button
Alert button
May 27, 2023
Linhao Dong, Zhecheng An, Peihao Wu, Jun Zhang, Lu Lu, Zejun Ma

Figure 1 for CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-Training
Figure 2 for CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-Training
Figure 3 for CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-Training
Figure 4 for CIF-PT: Bridging Speech and Text Representations for Spoken Language Understanding via Continuous Integrate-and-Fire Pre-Training
Viaarxiv icon

Improving Fairness and Robustness in End-to-End Speech Recognition through unsupervised clustering

Jun 06, 2023
Irina-Elena Veliche, Pascale Fung

Figure 1 for Improving Fairness and Robustness in End-to-End Speech Recognition through unsupervised clustering
Figure 2 for Improving Fairness and Robustness in End-to-End Speech Recognition through unsupervised clustering
Figure 3 for Improving Fairness and Robustness in End-to-End Speech Recognition through unsupervised clustering
Figure 4 for Improving Fairness and Robustness in End-to-End Speech Recognition through unsupervised clustering
Viaarxiv icon

XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech

Add code
Bookmark button
Alert button
May 31, 2023
Linh The Nguyen, Thinh Pham, Dat Quoc Nguyen

Figure 1 for XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech
Figure 2 for XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech
Figure 3 for XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech
Figure 4 for XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech
Viaarxiv icon