Picture for Zhou Zhao

Zhou Zhao

Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis

Add code
Feb 26, 2025
Figure 1 for Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
Figure 2 for Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
Figure 3 for Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
Figure 4 for Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis
Viaarxiv icon

CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale

Add code
Feb 23, 2025
Viaarxiv icon

WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models

Add code
Feb 20, 2025
Figure 1 for WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models
Figure 2 for WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models
Figure 3 for WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models
Figure 4 for WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models
Viaarxiv icon

EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration

Add code
Feb 20, 2025
Viaarxiv icon

Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model

Add code
Feb 08, 2025
Figure 1 for Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
Figure 2 for Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
Figure 3 for Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
Figure 4 for Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
Viaarxiv icon

Low-rank Prompt Interaction for Continual Vision-Language Retrieval

Add code
Jan 24, 2025
Figure 1 for Low-rank Prompt Interaction for Continual Vision-Language Retrieval
Figure 2 for Low-rank Prompt Interaction for Continual Vision-Language Retrieval
Figure 3 for Low-rank Prompt Interaction for Continual Vision-Language Retrieval
Figure 4 for Low-rank Prompt Interaction for Continual Vision-Language Retrieval
Viaarxiv icon

OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios

Add code
Jan 02, 2025
Figure 1 for OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios
Figure 2 for OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios
Figure 3 for OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios
Figure 4 for OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios
Viaarxiv icon

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

Add code
Dec 24, 2024
Viaarxiv icon

FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation

Add code
Dec 22, 2024
Figure 1 for FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation
Figure 2 for FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation
Figure 3 for FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation
Figure 4 for FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation
Viaarxiv icon

Speech Watermarking with Discrete Intermediate Representations

Add code
Dec 18, 2024
Viaarxiv icon