Alert button

"speech": models, code, and papers
Alert button

EffCRN: An Efficient Convolutional Recurrent Network for High-Performance Speech Enhancement

Jun 05, 2023
Marvin Sach, Jan Franzen, Bruno Defraene, Kristoff Fluyt, Maximilian Strake, Wouter Tirry, Tim Fingscheidt

Figure 1 for EffCRN: An Efficient Convolutional Recurrent Network for High-Performance Speech Enhancement
Figure 2 for EffCRN: An Efficient Convolutional Recurrent Network for High-Performance Speech Enhancement
Figure 3 for EffCRN: An Efficient Convolutional Recurrent Network for High-Performance Speech Enhancement
Figure 4 for EffCRN: An Efficient Convolutional Recurrent Network for High-Performance Speech Enhancement
Viaarxiv icon

Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS

Aug 03, 2023
Myeongjin Ko, Yong-Hoon Choi

Figure 1 for Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
Figure 2 for Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
Figure 3 for Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
Figure 4 for Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS
Viaarxiv icon

On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation

Jul 06, 2023
Gene-Ping Yang, Yue Gu, Qingming Tang, Dongsu Du, Yuzong Liu

Figure 1 for On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation
Figure 2 for On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation
Figure 3 for On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation
Figure 4 for On-Device Constrained Self-Supervised Speech Representation Learning for Keyword Spotting via Knowledge Distillation
Viaarxiv icon

2-bit Conformer quantization for automatic speech recognition

May 26, 2023
Oleg Rybakov, Phoenix Meadowlark, Shaojin Ding, David Qiu, Jian Li, David Rim, Yanzhang He

Figure 1 for 2-bit Conformer quantization for automatic speech recognition
Figure 2 for 2-bit Conformer quantization for automatic speech recognition
Figure 3 for 2-bit Conformer quantization for automatic speech recognition
Figure 4 for 2-bit Conformer quantization for automatic speech recognition
Viaarxiv icon

Classifying Dementia in the Presence of Depression: A Cross-Corpus Study

Add code
Bookmark button
Alert button
Aug 16, 2023
Franziska Braun, Sebastian P. Bayerl, Paula A. Pérez-Toro, Florian Hönig, Hartmut Lehfeld, Thomas Hillemacher, Elmar Nöth, Tobias Bocklet, Korbinian Riedhammer

Figure 1 for Classifying Dementia in the Presence of Depression: A Cross-Corpus Study
Figure 2 for Classifying Dementia in the Presence of Depression: A Cross-Corpus Study
Viaarxiv icon

Audio Diffusion Model for Speech Synthesis: A Survey on Text To Speech and Speech Enhancement in Generative AI

Add code
Bookmark button
Alert button
Mar 23, 2023
Chenshuang Zhang, Chaoning Zhang, Sheng Zheng, Mengchun Zhang, Maryam Qamar, Sung-Ho Bae, In So Kweon

Figure 1 for Audio Diffusion Model for Speech Synthesis: A Survey on Text To Speech and Speech Enhancement in Generative AI
Figure 2 for Audio Diffusion Model for Speech Synthesis: A Survey on Text To Speech and Speech Enhancement in Generative AI
Figure 3 for Audio Diffusion Model for Speech Synthesis: A Survey on Text To Speech and Speech Enhancement in Generative AI
Figure 4 for Audio Diffusion Model for Speech Synthesis: A Survey on Text To Speech and Speech Enhancement in Generative AI
Viaarxiv icon

RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting

Aug 31, 2023
Hui Wang, Shiwan Zhao, Xiguang Zheng, Yong Qin

Figure 1 for RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting
Figure 2 for RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting
Figure 3 for RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting
Figure 4 for RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting
Viaarxiv icon

A vector quantized masked autoencoder for audiovisual speech emotion recognition

Add code
Bookmark button
Alert button
May 05, 2023
Samir Sadok, Simon Leglaive, Renaud Séguier

Figure 1 for A vector quantized masked autoencoder for audiovisual speech emotion recognition
Figure 2 for A vector quantized masked autoencoder for audiovisual speech emotion recognition
Figure 3 for A vector quantized masked autoencoder for audiovisual speech emotion recognition
Figure 4 for A vector quantized masked autoencoder for audiovisual speech emotion recognition
Viaarxiv icon

Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data

Add code
Bookmark button
Alert button
May 18, 2023
Yusheng Tian, Wei Liu, Tan Lee

Figure 1 for Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data
Figure 2 for Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data
Figure 3 for Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data
Figure 4 for Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data
Viaarxiv icon

Speech Separation based on Contrastive Learning and Deep Modularization

May 18, 2023
Peter Ochieng

Figure 1 for Speech Separation based on Contrastive Learning and Deep Modularization
Figure 2 for Speech Separation based on Contrastive Learning and Deep Modularization
Figure 3 for Speech Separation based on Contrastive Learning and Deep Modularization
Viaarxiv icon