Alert button

"speech": models, code, and papers
Alert button

Masks Fusion with Multi-Target Learning For Speech Enhancement

Add code
Bookmark button
Alert button
Sep 28, 2021
Liangchen Zhou, Wenbin Jiang, Jingyan Xu, Fei Wen, Peilin Liu

Figure 1 for Masks Fusion with Multi-Target Learning For Speech Enhancement
Figure 2 for Masks Fusion with Multi-Target Learning For Speech Enhancement
Figure 3 for Masks Fusion with Multi-Target Learning For Speech Enhancement
Figure 4 for Masks Fusion with Multi-Target Learning For Speech Enhancement
Viaarxiv icon

Classifying Autism from Crowdsourced Semi-Structured Speech Recordings: A Machine Learning Approach

Jan 04, 2022
Nathan A. Chi, Peter Washington, Aaron Kline, Arman Husic, Cathy Hou, Chloe He, Kaitlyn Dunlap, Dennis Wall

Figure 1 for Classifying Autism from Crowdsourced Semi-Structured Speech Recordings: A Machine Learning Approach
Figure 2 for Classifying Autism from Crowdsourced Semi-Structured Speech Recordings: A Machine Learning Approach
Figure 3 for Classifying Autism from Crowdsourced Semi-Structured Speech Recordings: A Machine Learning Approach
Figure 4 for Classifying Autism from Crowdsourced Semi-Structured Speech Recordings: A Machine Learning Approach
Viaarxiv icon

Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units

Oct 31, 2021
Anurag Katakkar, Alan W Black

Figure 1 for Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units
Figure 2 for Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units
Figure 3 for Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units
Figure 4 for Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units
Viaarxiv icon

Dyadic Interaction Assessment from Free-living Audio for Depression Severity Assessment

Sep 08, 2022
Bishal Lamichhane, Nidal Moukaddam, Ankit B. Patel, Ashutosh Sabharwal

Figure 1 for Dyadic Interaction Assessment from Free-living Audio for Depression Severity Assessment
Figure 2 for Dyadic Interaction Assessment from Free-living Audio for Depression Severity Assessment
Figure 3 for Dyadic Interaction Assessment from Free-living Audio for Depression Severity Assessment
Figure 4 for Dyadic Interaction Assessment from Free-living Audio for Depression Severity Assessment
Viaarxiv icon

Speech watermarking: a solution for authentication of forensic audio digital recordings

Feb 23, 2022
Marcos Faundez-Zanuy, Jose Juan Lucena-Molina, Martin Hagmueller, Gernot Kubin

Figure 1 for Speech watermarking: a solution for authentication of forensic audio digital recordings
Figure 2 for Speech watermarking: a solution for authentication of forensic audio digital recordings
Figure 3 for Speech watermarking: a solution for authentication of forensic audio digital recordings
Figure 4 for Speech watermarking: a solution for authentication of forensic audio digital recordings
Viaarxiv icon

NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS

Add code
Bookmark button
Alert button
Nov 04, 2022
Dongchao Yang, Songxiang Liu, Jianwei Yu, Helin Wang, Chao Weng, Yuexian Zou

Figure 1 for NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
Figure 2 for NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
Figure 3 for NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
Viaarxiv icon

FedSpeech: Federated Text-to-Speech with Continual Learning

Add code
Bookmark button
Alert button
Oct 14, 2021
Ziyue Jiang, Yi Ren, Ming Lei, Zhou Zhao

Figure 1 for FedSpeech: Federated Text-to-Speech with Continual Learning
Figure 2 for FedSpeech: Federated Text-to-Speech with Continual Learning
Figure 3 for FedSpeech: Federated Text-to-Speech with Continual Learning
Figure 4 for FedSpeech: Federated Text-to-Speech with Continual Learning
Viaarxiv icon

4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders

Dec 21, 2022
Yui Sudo, Muhammad Shakeel, Brian Yan, Jiatong Shi, Shinji Watanabe

Figure 1 for 4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders
Figure 2 for 4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders
Figure 3 for 4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders
Viaarxiv icon

Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational Complexity

Add code
Bookmark button
Alert button
Dec 08, 2022
Ahmed Mustafa, Jean-Marc Valin, Jan Büthe, Paris Smaragdis, Mike Goodwin

Figure 1 for Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational Complexity
Figure 2 for Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational Complexity
Figure 3 for Framewise WaveGAN: High Speed Adversarial Vocoder in Time Domain with Very Low Computational Complexity
Viaarxiv icon

The Sound of Silence: Efficiency of First Digit Features in Synthetic Audio Detection

Add code
Bookmark button
Alert button
Oct 06, 2022
Daniele Mari, Federica Latora, Simone Milani

Figure 1 for The Sound of Silence: Efficiency of First Digit Features in Synthetic Audio Detection
Figure 2 for The Sound of Silence: Efficiency of First Digit Features in Synthetic Audio Detection
Figure 3 for The Sound of Silence: Efficiency of First Digit Features in Synthetic Audio Detection
Figure 4 for The Sound of Silence: Efficiency of First Digit Features in Synthetic Audio Detection
Viaarxiv icon