Picture for Rongjie Huang

Rongjie Huang

MEDIC: Zero-shot Music Editing with Disentangled Inversion Control

Add code
Jul 18, 2024
Viaarxiv icon

OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces

Add code
Jul 16, 2024
Viaarxiv icon

Accompanied Singing Voice Synthesis with Fully Text-controlled Melody

Add code
Jul 02, 2024
Figure 1 for Accompanied Singing Voice Synthesis with Fully Text-controlled Melody
Figure 2 for Accompanied Singing Voice Synthesis with Fully Text-controlled Melody
Figure 3 for Accompanied Singing Voice Synthesis with Fully Text-controlled Melody
Figure 4 for Accompanied Singing Voice Synthesis with Fully Text-controlled Melody
Viaarxiv icon

UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner

Add code
Jun 14, 2024
Figure 1 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Figure 2 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Figure 3 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Figure 4 for UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Viaarxiv icon

Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion

Add code
Jun 04, 2024
Figure 1 for Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion
Figure 2 for Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion
Figure 3 for Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion
Figure 4 for Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion
Viaarxiv icon

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

Add code
Jun 03, 2024
Viaarxiv icon

AudioLCM: Text-to-Audio Generation with Latent Consistency Models

Add code
Jun 01, 2024
Viaarxiv icon

Frieren: Efficient Video-to-Audio Generation with Rectified Flow Matching

Add code
Jun 01, 2024
Viaarxiv icon

Robust Singing Voice Transcription Serves Synthesis

Add code
May 16, 2024
Figure 1 for Robust Singing Voice Transcription Serves Synthesis
Figure 2 for Robust Singing Voice Transcription Serves Synthesis
Figure 3 for Robust Singing Voice Transcription Serves Synthesis
Figure 4 for Robust Singing Voice Transcription Serves Synthesis
Viaarxiv icon

FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion

Add code
May 10, 2024
Figure 1 for FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
Figure 2 for FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
Figure 3 for FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
Figure 4 for FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion
Viaarxiv icon