Picture for Fenglong Xie

Fenglong Xie

SoCodec: A Semantic-Ordered Multi-Stream Speech Codec for Efficient Language Model Based Text-to-Speech Synthesis

Add code
Sep 02, 2024
Viaarxiv icon

Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder

Add code
Jun 05, 2024
Figure 1 for Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
Figure 2 for Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
Figure 3 for Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
Figure 4 for Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
Viaarxiv icon

QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning

Add code
Aug 31, 2023
Figure 1 for QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning
Figure 2 for QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning
Figure 3 for QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning
Figure 4 for QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning
Viaarxiv icon

Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations

Add code
Oct 27, 2022
Figure 1 for Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations
Figure 2 for Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations
Figure 3 for Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations
Figure 4 for Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations
Viaarxiv icon

A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS

Add code
Sep 22, 2022
Figure 1 for A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
Figure 2 for A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
Figure 3 for A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
Figure 4 for A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
Viaarxiv icon

Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS

Add code
Sep 28, 2021
Figure 1 for Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS
Figure 2 for Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS
Figure 3 for Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS
Figure 4 for Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS
Viaarxiv icon

Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet

Add code
Feb 09, 2021
Figure 1 for Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet
Figure 2 for Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet
Figure 3 for Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet
Figure 4 for Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet
Viaarxiv icon