Picture for Xu Tan

Xu Tan

BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec

Add code
Sep 09, 2024
Viaarxiv icon

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Add code
Aug 30, 2024
Figure 1 for Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Figure 2 for Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Figure 3 for Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Figure 4 for Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Viaarxiv icon

Foundation Models for Music: A Survey

Add code
Aug 27, 2024
Viaarxiv icon

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

Add code
Jun 26, 2024
Figure 1 for E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Figure 2 for E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Figure 3 for E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Figure 4 for E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Viaarxiv icon

EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms

Add code
Jun 20, 2024
Figure 1 for EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms
Figure 2 for EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms
Figure 3 for EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms
Figure 4 for EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms
Viaarxiv icon

UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner

Add code
Jun 14, 2024
Figure 1 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Figure 2 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Figure 3 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Figure 4 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Viaarxiv icon

Weakly-supervised anomaly detection for multimodal data distributions

Add code
Jun 13, 2024
Viaarxiv icon

Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement

Add code
Jun 12, 2024
Figure 1 for Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement
Figure 2 for Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement
Figure 3 for Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement
Figure 4 for Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement
Viaarxiv icon

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

Add code
Jun 08, 2024
Figure 1 for VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Figure 2 for VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Figure 3 for VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Figure 4 for VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Viaarxiv icon

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

Add code
Jun 06, 2024
Figure 1 for VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Figure 2 for VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Figure 3 for VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Figure 4 for VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Viaarxiv icon