Alert button

"speech": models, code, and papers
Alert button

Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis

Add code
Bookmark button
Alert button
Jun 03, 2021
Beata Lorincz, Adriana Stan, Mircea Giurgiu

Figure 1 for Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis
Figure 2 for Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis
Figure 3 for Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis
Figure 4 for Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis
Viaarxiv icon

Cloud-Based Face and Speech Recognition for Access Control Applications

May 08, 2020
Nathalie Tkauc, Thao Tran, Kevin Hernandez-Diaz, Fernando Alonso-Fernandez

Figure 1 for Cloud-Based Face and Speech Recognition for Access Control Applications
Figure 2 for Cloud-Based Face and Speech Recognition for Access Control Applications
Figure 3 for Cloud-Based Face and Speech Recognition for Access Control Applications
Figure 4 for Cloud-Based Face and Speech Recognition for Access Control Applications
Viaarxiv icon

Parallel Neural Text-to-Speech

Add code
Bookmark button
Alert button
Jun 05, 2019
Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao

Figure 1 for Parallel Neural Text-to-Speech
Figure 2 for Parallel Neural Text-to-Speech
Figure 3 for Parallel Neural Text-to-Speech
Figure 4 for Parallel Neural Text-to-Speech
Viaarxiv icon

Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts

Add code
Bookmark button
Alert button
May 24, 2022
Debjoy Saha, Shravan Nayak, Timo Baumann

Figure 1 for Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts
Figure 2 for Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts
Figure 3 for Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts
Figure 4 for Merkel Podcast Corpus: A Multimodal Dataset Compiled from 16 Years of Angela Merkel's Weekly Video Podcasts
Viaarxiv icon

Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis

Add code
Bookmark button
Alert button
Nov 17, 2020
Chung-Ming Chien, Hung-yi Lee

Figure 1 for Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis
Figure 2 for Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis
Figure 3 for Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis
Figure 4 for Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis
Viaarxiv icon

Unsupervised pre-traing for sequence to sequence speech recognition

Oct 28, 2019
Zhiyun Fan, Shiyu Zhou, Bo Xu

Figure 1 for Unsupervised pre-traing for sequence to sequence speech recognition
Figure 2 for Unsupervised pre-traing for sequence to sequence speech recognition
Figure 3 for Unsupervised pre-traing for sequence to sequence speech recognition
Figure 4 for Unsupervised pre-traing for sequence to sequence speech recognition
Viaarxiv icon

Vietnamese Hate and Offensive Detection using PhoBERT-CNN and Social Media Streaming Data

Add code
Bookmark button
Alert button
Jun 01, 2022
Khanh Q. Tran, An T. Nguyen, Phu Gia Hoang, Canh Duc Luu, Trong-Hop Do, Kiet Van Nguyen

Figure 1 for Vietnamese Hate and Offensive Detection using PhoBERT-CNN and Social Media Streaming Data
Figure 2 for Vietnamese Hate and Offensive Detection using PhoBERT-CNN and Social Media Streaming Data
Figure 3 for Vietnamese Hate and Offensive Detection using PhoBERT-CNN and Social Media Streaming Data
Figure 4 for Vietnamese Hate and Offensive Detection using PhoBERT-CNN and Social Media Streaming Data
Viaarxiv icon

Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading

Apr 04, 2022
Minsu Kim, Jeong Hun Yeo, Yong Man Ro

Figure 1 for Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
Figure 2 for Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
Figure 3 for Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
Figure 4 for Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
Viaarxiv icon

Few-Shot Speaker Identification Using Depthwise Separable Convolutional Network with Channel Attention

Apr 24, 2022
Yanxiong Li, Wucheng Wang, Hao Chen, Wenchang Cao, Wei Li, Qianhua He

Figure 1 for Few-Shot Speaker Identification Using Depthwise Separable Convolutional Network with Channel Attention
Figure 2 for Few-Shot Speaker Identification Using Depthwise Separable Convolutional Network with Channel Attention
Figure 3 for Few-Shot Speaker Identification Using Depthwise Separable Convolutional Network with Channel Attention
Figure 4 for Few-Shot Speaker Identification Using Depthwise Separable Convolutional Network with Channel Attention
Viaarxiv icon

Neural Architecture Search for Speech Recognition

Jul 17, 2020
Shoukang Hu, Xurong Xie, Shansong Liu, Mengzhe Geng, Xunying Liu, Helen Meng

Figure 1 for Neural Architecture Search for Speech Recognition
Figure 2 for Neural Architecture Search for Speech Recognition
Figure 3 for Neural Architecture Search for Speech Recognition
Figure 4 for Neural Architecture Search for Speech Recognition
Viaarxiv icon