Picture for Zhou Zhao

Zhou Zhao

WavChat: A Survey of Spoken Dialogue Models

Add code
Nov 26, 2024
Figure 1 for WavChat: A Survey of Spoken Dialogue Models
Figure 2 for WavChat: A Survey of Spoken Dialogue Models
Figure 3 for WavChat: A Survey of Spoken Dialogue Models
Figure 4 for WavChat: A Survey of Spoken Dialogue Models
Viaarxiv icon

MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence

Add code
Nov 04, 2024
Figure 1 for MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence
Figure 2 for MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence
Figure 3 for MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence
Figure 4 for MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence
Viaarxiv icon

Classifier-guided Gradient Modulation for Enhanced Multimodal Learning

Add code
Nov 03, 2024
Viaarxiv icon

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup

Add code
Oct 28, 2024
Figure 1 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Figure 2 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Figure 3 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Figure 4 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Viaarxiv icon

A Comprehensive Survey of Datasets, Theories, Variants, and Applications in Direct Preference Optimization

Add code
Oct 21, 2024
Viaarxiv icon

FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation

Add code
Oct 16, 2024
Figure 1 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Figure 2 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Figure 3 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Figure 4 for FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Viaarxiv icon

MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization

Add code
Oct 16, 2024
Figure 1 for MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Figure 2 for MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Figure 3 for MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Figure 4 for MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Viaarxiv icon

MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes

Add code
Oct 09, 2024
Figure 1 for MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
Figure 2 for MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
Figure 3 for MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
Figure 4 for MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
Viaarxiv icon

Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models

Add code
Sep 28, 2024
Figure 1 for Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models
Figure 2 for Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models
Figure 3 for Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models
Figure 4 for Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models
Viaarxiv icon

GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks

Add code
Sep 26, 2024
Figure 1 for GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
Figure 2 for GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
Figure 3 for GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
Figure 4 for GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks
Viaarxiv icon