Picture for Xize Cheng

Xize Cheng

T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback

Add code
May 15, 2025
Viaarxiv icon

WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

Add code
May 14, 2025
Viaarxiv icon

Unleashing the Power of Natural Audio Featuring Multiple Sound Sources

Add code
Apr 24, 2025
Viaarxiv icon

Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model

Add code
Feb 08, 2025
Figure 1 for Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
Figure 2 for Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
Figure 3 for Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
Figure 4 for Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
Viaarxiv icon

OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios

Add code
Jan 02, 2025
Figure 1 for OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios
Figure 2 for OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios
Figure 3 for OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios
Figure 4 for OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios
Viaarxiv icon

A Wander Through the Multimodal Landscape: Efficient Transfer Learning via Low-rank Sequence Multimodal Adapter

Add code
Dec 12, 2024
Viaarxiv icon

WavChat: A Survey of Spoken Dialogue Models

Add code
Nov 26, 2024
Figure 1 for WavChat: A Survey of Spoken Dialogue Models
Figure 2 for WavChat: A Survey of Spoken Dialogue Models
Figure 3 for WavChat: A Survey of Spoken Dialogue Models
Figure 4 for WavChat: A Survey of Spoken Dialogue Models
Viaarxiv icon

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup

Add code
Oct 28, 2024
Figure 1 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Figure 2 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Figure 3 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Figure 4 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Viaarxiv icon

MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization

Add code
Oct 16, 2024
Figure 1 for MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Figure 2 for MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Figure 3 for MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Figure 4 for MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Viaarxiv icon

SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing

Add code
Sep 05, 2024
Figure 1 for SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing
Figure 2 for SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing
Figure 3 for SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing
Figure 4 for SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing
Viaarxiv icon