Picture for Zhou Zhao

Zhou Zhao

MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis

Add code
Jul 19, 2024
Viaarxiv icon

MEDIC: Zero-shot Music Editing with Disentangled Inversion Control

Add code
Jul 18, 2024
Viaarxiv icon

OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces

Add code
Jul 16, 2024
Viaarxiv icon

Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition

Add code
Jul 07, 2024
Figure 1 for Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition
Figure 2 for Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition
Figure 3 for Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition
Figure 4 for Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition
Viaarxiv icon

Accompanied Singing Voice Synthesis with Fully Text-controlled Melody

Add code
Jul 02, 2024
Figure 1 for Accompanied Singing Voice Synthesis with Fully Text-controlled Melody
Figure 2 for Accompanied Singing Voice Synthesis with Fully Text-controlled Melody
Figure 3 for Accompanied Singing Voice Synthesis with Fully Text-controlled Melody
Figure 4 for Accompanied Singing Voice Synthesis with Fully Text-controlled Melody
Viaarxiv icon

ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling

Add code
Jun 25, 2024
Figure 1 for ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling
Figure 2 for ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling
Figure 3 for ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling
Figure 4 for ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling
Viaarxiv icon

EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration

Add code
Jun 20, 2024
Figure 1 for EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration
Figure 2 for EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration
Figure 3 for EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration
Figure 4 for EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration
Viaarxiv icon

Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion

Add code
Jun 04, 2024
Figure 1 for Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion
Figure 2 for Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion
Figure 3 for Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion
Figure 4 for Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion
Viaarxiv icon

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

Add code
Jun 03, 2024
Viaarxiv icon

Frieren: Efficient Video-to-Audio Generation with Rectified Flow Matching

Add code
Jun 01, 2024
Viaarxiv icon