Picture for Jinyu Li

Jinyu Li

Fred

Autoregressive Speech Synthesis without Vector Quantization

Add code
Jul 11, 2024
Viaarxiv icon

Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation

Add code
Jun 12, 2024
Viaarxiv icon

VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment

Add code
Jun 12, 2024
Figure 1 for VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment
Figure 2 for VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment
Figure 3 for VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment
Figure 4 for VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment
Viaarxiv icon

An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS

Add code
Jun 09, 2024
Viaarxiv icon

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

Add code
Jun 08, 2024
Figure 1 for VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Figure 2 for VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Figure 3 for VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Figure 4 for VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Viaarxiv icon

Total-Duration-Aware Duration Modeling for Text-to-Speech Systems

Add code
Jun 06, 2024
Figure 1 for Total-Duration-Aware Duration Modeling for Text-to-Speech Systems
Figure 2 for Total-Duration-Aware Duration Modeling for Text-to-Speech Systems
Figure 3 for Total-Duration-Aware Duration Modeling for Text-to-Speech Systems
Figure 4 for Total-Duration-Aware Duration Modeling for Text-to-Speech Systems
Viaarxiv icon

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

Add code
May 28, 2024
Viaarxiv icon

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

Add code
Apr 10, 2024
Viaarxiv icon

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Add code
Apr 06, 2024
Viaarxiv icon

WavLLM: Towards Robust and Adaptive Speech Large Language Model

Add code
Mar 31, 2024
Figure 1 for WavLLM: Towards Robust and Adaptive Speech Large Language Model
Figure 2 for WavLLM: Towards Robust and Adaptive Speech Large Language Model
Figure 3 for WavLLM: Towards Robust and Adaptive Speech Large Language Model
Figure 4 for WavLLM: Towards Robust and Adaptive Speech Large Language Model
Viaarxiv icon