Alert button

"speech": models, code, and papers
Alert button

Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation

Mar 09, 2023
Qi Chen, Ziyang Ma, Tao Liu, Xu Tan, Qu Lu, Xie Chen, Kai Yu

Figure 1 for Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation
Figure 2 for Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation
Figure 3 for Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation
Figure 4 for Improving Few-Shot Learning for Talking Face System with TTS Data Augmentation
Viaarxiv icon

Improving Target Speaker Extraction with Sparse LDA-transformed Speaker Embeddings

Jan 16, 2023
Kai Liu, Xucheng Wan, Ziqing Du, Huan Zhou

Figure 1 for Improving Target Speaker Extraction with Sparse LDA-transformed Speaker Embeddings
Figure 2 for Improving Target Speaker Extraction with Sparse LDA-transformed Speaker Embeddings
Figure 3 for Improving Target Speaker Extraction with Sparse LDA-transformed Speaker Embeddings
Figure 4 for Improving Target Speaker Extraction with Sparse LDA-transformed Speaker Embeddings
Viaarxiv icon

Visually Grounded Keyword Detection and Localisation for Low-Resource Languages

Feb 01, 2023
Kayode Kolawole Olaleye

Figure 1 for Visually Grounded Keyword Detection and Localisation for Low-Resource Languages
Figure 2 for Visually Grounded Keyword Detection and Localisation for Low-Resource Languages
Figure 3 for Visually Grounded Keyword Detection and Localisation for Low-Resource Languages
Figure 4 for Visually Grounded Keyword Detection and Localisation for Low-Resource Languages
Viaarxiv icon

A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement of Neural Post-filter for Low-cost Text-to-speech System

Jul 13, 2022
Yi-Chiao Wu, Patrick Lumban Tobing, Kazuki Yasuhara, Noriyuki Matsunaga, Yamato Ohtani, Tomoki Toda

Figure 1 for A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement of Neural Post-filter for Low-cost Text-to-speech System
Figure 2 for A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement of Neural Post-filter for Low-cost Text-to-speech System
Figure 3 for A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement of Neural Post-filter for Low-cost Text-to-speech System
Figure 4 for A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement of Neural Post-filter for Low-cost Text-to-speech System
Viaarxiv icon

UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis

Dec 06, 2022
Yi Lei, Shan Yang, Xinsheng Wang, Qicong Xie, Jixun Yao, Lei Xie, Dan Su

Figure 1 for UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis
Figure 2 for UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis
Figure 3 for UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis
Figure 4 for UniSyn: An End-to-End Unified Model for Text-to-Speech and Singing Voice Synthesis
Viaarxiv icon

Controlling High-Dimensional Data With Sparse Input

Mar 14, 2023
Dan Andrei Iliescu, Devang Savita Ram Mohan, Tian Huey Teh, Zack Hodari

Figure 1 for Controlling High-Dimensional Data With Sparse Input
Figure 2 for Controlling High-Dimensional Data With Sparse Input
Figure 3 for Controlling High-Dimensional Data With Sparse Input
Figure 4 for Controlling High-Dimensional Data With Sparse Input
Viaarxiv icon

QSpeech: Low-Qubit Quantum Speech Application Toolkit

May 26, 2022
Zhenhou Hong, Jianzong Wang, Xiaoyang Qu, Chendong Zhao, Wei Tao, Jing Xiao

Figure 1 for QSpeech: Low-Qubit Quantum Speech Application Toolkit
Figure 2 for QSpeech: Low-Qubit Quantum Speech Application Toolkit
Figure 3 for QSpeech: Low-Qubit Quantum Speech Application Toolkit
Figure 4 for QSpeech: Low-Qubit Quantum Speech Application Toolkit
Viaarxiv icon

Calibrating Transformers via Sparse Gaussian Processes

Mar 04, 2023
Wenlong Chen, Yingzhen Li

Figure 1 for Calibrating Transformers via Sparse Gaussian Processes
Figure 2 for Calibrating Transformers via Sparse Gaussian Processes
Figure 3 for Calibrating Transformers via Sparse Gaussian Processes
Figure 4 for Calibrating Transformers via Sparse Gaussian Processes
Viaarxiv icon

Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training

Jun 21, 2022
Chengyi Wang, Yiming Wang, Yu Wu, Sanyuan Chen, Jinyu Li, Shujie Liu, Furu Wei

Figure 1 for Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training
Figure 2 for Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training
Viaarxiv icon

SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation

May 17, 2022
Sameer Khurana, Antoine Laurent, James Glass

Figure 1 for SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation
Figure 2 for SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation
Figure 3 for SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation
Figure 4 for SAMU-XLSR: Semantically-Aligned Multimodal Utterance-level Cross-Lingual Speech Representation
Viaarxiv icon