Picture for Songxiang Liu

Songxiang Liu

NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS

Add code
Nov 04, 2022
Figure 1 for NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
Figure 2 for NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
Figure 3 for NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
Viaarxiv icon

Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation

Add code
Feb 18, 2022
Figure 1 for Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation
Figure 2 for Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation
Figure 3 for Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation
Figure 4 for Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation
Viaarxiv icon

DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

Add code
Jan 28, 2022
Figure 1 for DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
Figure 2 for DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
Figure 3 for DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
Figure 4 for DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs
Viaarxiv icon

Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning

Add code
Nov 14, 2021
Figure 1 for Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning
Figure 2 for Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning
Figure 3 for Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning
Figure 4 for Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning
Viaarxiv icon

Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis

Add code
Sep 08, 2021
Figure 1 for Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis
Figure 2 for Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis
Figure 3 for Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis
Figure 4 for Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis
Viaarxiv icon

ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language Understanding

Add code
Aug 30, 2021
Figure 1 for ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language Understanding
Figure 2 for ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language Understanding
Figure 3 for ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language Understanding
Figure 4 for ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language Understanding
Viaarxiv icon

DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion

Add code
May 28, 2021
Figure 1 for DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion
Figure 2 for DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion
Figure 3 for DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion
Viaarxiv icon

VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention

Add code
Feb 12, 2021
Figure 1 for VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Figure 2 for VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Figure 3 for VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Figure 4 for VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Viaarxiv icon

Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling

Add code
Sep 06, 2020
Figure 1 for Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling
Figure 2 for Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling
Figure 3 for Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling
Figure 4 for Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling
Viaarxiv icon

Defense against adversarial attacks on spoofing countermeasures of ASV

Add code
Mar 06, 2020
Figure 1 for Defense against adversarial attacks on spoofing countermeasures of ASV
Figure 2 for Defense against adversarial attacks on spoofing countermeasures of ASV
Figure 3 for Defense against adversarial attacks on spoofing countermeasures of ASV
Viaarxiv icon