Picture for Lirong Dai

Lirong Dai

LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation

Add code
Aug 22, 2024
Viaarxiv icon

LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance

Add code
Jun 08, 2024
Viaarxiv icon

Adversarial speech for voice privacy protection from Personalized Speech generation

Add code
Jan 22, 2024
Figure 1 for Adversarial speech for voice privacy protection from Personalized Speech generation
Figure 2 for Adversarial speech for voice privacy protection from Personalized Speech generation
Figure 3 for Adversarial speech for voice privacy protection from Personalized Speech generation
Figure 4 for Adversarial speech for voice privacy protection from Personalized Speech generation
Viaarxiv icon

Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation

Add code
Jan 07, 2024
Figure 1 for Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation
Figure 2 for Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation
Figure 3 for Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation
Figure 4 for Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation
Viaarxiv icon

Rep2wav: Noise Robust text-to-speech Using self-supervised representations

Add code
Sep 04, 2023
Figure 1 for Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Figure 2 for Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Figure 3 for Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Figure 4 for Rep2wav: Noise Robust text-to-speech Using self-supervised representations
Viaarxiv icon

VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

Add code
Nov 21, 2022
Figure 1 for VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Figure 2 for VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Figure 3 for VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Figure 4 for VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Viaarxiv icon

SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training

Add code
Oct 07, 2022
Figure 1 for SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Figure 2 for SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Figure 3 for SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Figure 4 for SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder Based Speech-Text Pre-training
Viaarxiv icon

SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

Add code
Sep 30, 2022
Figure 1 for SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
Figure 2 for SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
Figure 3 for SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
Figure 4 for SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
Viaarxiv icon

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data

Add code
Mar 31, 2022
Figure 1 for Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
Figure 2 for Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
Figure 3 for Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
Figure 4 for Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data
Viaarxiv icon

The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021

Add code
Jul 09, 2021
Figure 1 for The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021
Figure 2 for The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021
Figure 3 for The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021
Figure 4 for The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021
Viaarxiv icon