Picture for Haohan Guo

Haohan Guo

UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner

Add code
Jun 14, 2024
Figure 1 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Figure 2 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Figure 3 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Figure 4 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Viaarxiv icon

Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation

Add code
Jun 11, 2024
Viaarxiv icon

Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder

Add code
Jun 05, 2024
Figure 1 for Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
Figure 2 for Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
Figure 3 for Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
Figure 4 for Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
Viaarxiv icon

SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models

Add code
Jun 04, 2024
Viaarxiv icon

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Add code
Feb 15, 2024
Viaarxiv icon

Cross-Speaker Encoding Network for Multi-Talker Speech Recognition

Add code
Jan 08, 2024
Figure 1 for Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
Figure 2 for Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
Figure 3 for Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
Figure 4 for Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
Viaarxiv icon

QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning

Add code
Aug 31, 2023
Figure 1 for QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning
Figure 2 for QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning
Figure 3 for QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning
Figure 4 for QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning
Viaarxiv icon

Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations

Add code
Oct 27, 2022
Figure 1 for Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations
Figure 2 for Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations
Figure 3 for Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations
Figure 4 for Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations
Viaarxiv icon

A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS

Add code
Sep 22, 2022
Figure 1 for A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
Figure 2 for A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
Figure 3 for A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
Figure 4 for A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
Viaarxiv icon

A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS

Add code
Mar 22, 2022
Figure 1 for A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS
Figure 2 for A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS
Figure 3 for A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS
Figure 4 for A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS
Viaarxiv icon