Picture for Jinyu Li

Jinyu Li

Fred

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

Add code
Jun 08, 2024
Viaarxiv icon

Total-Duration-Aware Duration Modeling for Text-to-Speech Systems

Add code
Jun 06, 2024
Figure 1 for Total-Duration-Aware Duration Modeling for Text-to-Speech Systems
Figure 2 for Total-Duration-Aware Duration Modeling for Text-to-Speech Systems
Figure 3 for Total-Duration-Aware Duration Modeling for Text-to-Speech Systems
Figure 4 for Total-Duration-Aware Duration Modeling for Text-to-Speech Systems
Viaarxiv icon

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

Add code
May 28, 2024
Viaarxiv icon

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

Add code
Apr 10, 2024
Figure 1 for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Figure 2 for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Figure 3 for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Figure 4 for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Viaarxiv icon

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Add code
Apr 06, 2024
Viaarxiv icon

WavLLM: Towards Robust and Adaptive Speech Large Language Model

Add code
Mar 31, 2024
Viaarxiv icon

Advanced Long-Content Speech Recognition With Factorized Neural Transducer

Add code
Mar 20, 2024
Figure 1 for Advanced Long-Content Speech Recognition With Factorized Neural Transducer
Figure 2 for Advanced Long-Content Speech Recognition With Factorized Neural Transducer
Figure 3 for Advanced Long-Content Speech Recognition With Factorized Neural Transducer
Figure 4 for Advanced Long-Content Speech Recognition With Factorized Neural Transducer
Viaarxiv icon

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Add code
Mar 05, 2024
Viaarxiv icon

Boosting Large Language Model for Speech Synthesis: An Empirical Study

Add code
Dec 30, 2023
Viaarxiv icon

COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning

Add code
Nov 03, 2023
Figure 1 for COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning
Figure 2 for COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning
Figure 3 for COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning
Figure 4 for COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning
Viaarxiv icon