Alert button

"speech": models, code, and papers
Alert button

Audio-visual speech enhancement with a deep Kalman filter generative model

Add code
Bookmark button
Alert button
Nov 02, 2022
Ali Golmakani, Mostafa Sadeghi, Romain Serizel

Figure 1 for Audio-visual speech enhancement with a deep Kalman filter generative model
Figure 2 for Audio-visual speech enhancement with a deep Kalman filter generative model
Figure 3 for Audio-visual speech enhancement with a deep Kalman filter generative model
Viaarxiv icon

Analysis of impact of emotions on target speech extraction and speech separation

Add code
Bookmark button
Alert button
Aug 15, 2022
Ján Švec, Kateřina Žmolíková, Martin Kocour, Marc Delcroix, Tsubasa Ochiai, Ladislav Mošner, Jan Černocký

Figure 1 for Analysis of impact of emotions on target speech extraction and speech separation
Figure 2 for Analysis of impact of emotions on target speech extraction and speech separation
Figure 3 for Analysis of impact of emotions on target speech extraction and speech separation
Figure 4 for Analysis of impact of emotions on target speech extraction and speech separation
Viaarxiv icon

Language-Universal Adapter Learning with Knowledge Distillation for End-to-End Multilingual Speech Recognition

Add code
Bookmark button
Alert button
Feb 28, 2023
Zhijie Shen, Wu Guo, Bin Gu

Figure 1 for Language-Universal Adapter Learning with Knowledge Distillation for End-to-End Multilingual Speech Recognition
Figure 2 for Language-Universal Adapter Learning with Knowledge Distillation for End-to-End Multilingual Speech Recognition
Figure 3 for Language-Universal Adapter Learning with Knowledge Distillation for End-to-End Multilingual Speech Recognition
Figure 4 for Language-Universal Adapter Learning with Knowledge Distillation for End-to-End Multilingual Speech Recognition
Viaarxiv icon

VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

Add code
Bookmark button
Alert button
Nov 21, 2022
Qiushi Zhu, Long Zhou, Ziqiang Zhang, Shujie Liu, Binxing Jiao, Jie Zhang, Lirong Dai, Daxin Jiang, Jinyu Li, Furu Wei

Figure 1 for VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Figure 2 for VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Figure 3 for VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Figure 4 for VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Viaarxiv icon

Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition

Add code
Bookmark button
Alert button
Nov 03, 2022
Zengrui Jin, Xurong Xie, Mengzhe Geng, Tianzi Wang, Shujie Hu, Jiajun Deng, Guinan Li, Xunying Liu

Figure 1 for Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition
Figure 2 for Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition
Figure 3 for Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition
Figure 4 for Adversarial Data Augmentation Using VAE-GAN for Disordered Speech Recognition
Viaarxiv icon

Leveraging Language Identification to Enhance Code-Mixed Text Classification

Jun 08, 2023
Gauri Takawane, Abhishek Phaltankar, Varad Patwardhan, Aryan Patil, Raviraj Joshi, Mukta S. Takalikar

Figure 1 for Leveraging Language Identification to Enhance Code-Mixed Text Classification
Figure 2 for Leveraging Language Identification to Enhance Code-Mixed Text Classification
Figure 3 for Leveraging Language Identification to Enhance Code-Mixed Text Classification
Figure 4 for Leveraging Language Identification to Enhance Code-Mixed Text Classification
Viaarxiv icon

Towards Building Voice-based Conversational Recommender Systems: Datasets, Potential Solutions, and Prospects

Add code
Bookmark button
Alert button
Jun 14, 2023
Xinghua Qu, Hongyang Liu, Zhu Sun, Xiang Yin, Yew Soon Ong, Lu Lu, Zejun Ma

Figure 1 for Towards Building Voice-based Conversational Recommender Systems: Datasets, Potential Solutions, and Prospects
Figure 2 for Towards Building Voice-based Conversational Recommender Systems: Datasets, Potential Solutions, and Prospects
Figure 3 for Towards Building Voice-based Conversational Recommender Systems: Datasets, Potential Solutions, and Prospects
Figure 4 for Towards Building Voice-based Conversational Recommender Systems: Datasets, Potential Solutions, and Prospects
Viaarxiv icon

Streaming Joint Speech Recognition and Disfluency Detection

Add code
Bookmark button
Alert button
Nov 16, 2022
Hayato Futami, Emiru Tsunoo, Kentaro Shibata, Yosuke Kashiwagi, Takao Okuda, Siddhant Arora, Shinji Watanabe

Figure 1 for Streaming Joint Speech Recognition and Disfluency Detection
Figure 2 for Streaming Joint Speech Recognition and Disfluency Detection
Figure 3 for Streaming Joint Speech Recognition and Disfluency Detection
Figure 4 for Streaming Joint Speech Recognition and Disfluency Detection
Viaarxiv icon

Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System

Add code
Bookmark button
Alert button
Nov 21, 2022
Takenori Yoshimura, Shinji Takaki, Kazuhiro Nakamura, Keiichiro Oura, Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda

Figure 1 for Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System
Figure 2 for Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System
Figure 3 for Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System
Figure 4 for Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System
Viaarxiv icon

Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces

Add code
Bookmark button
Alert button
May 21, 2023
Oli Liu, Hao Tang, Sharon Goldwater

Figure 1 for Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces
Figure 2 for Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces
Figure 3 for Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces
Figure 4 for Self-supervised Predictive Coding Models Encode Speaker and Phonetic Information in Orthogonal Subspaces
Viaarxiv icon