Picture for Xize Cheng

Xize Cheng

SegTalker: Segmentation-based Talking Face Generation with Mask-guided Local Editing

Add code
Sep 05, 2024
Viaarxiv icon

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Add code
Aug 29, 2024
Viaarxiv icon

Landmark-guided Diffusion Model for High-fidelity and Temporally Coherent Talking Head Generation

Add code
Aug 03, 2024
Viaarxiv icon

OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces

Add code
Jul 16, 2024
Viaarxiv icon

ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling

Add code
Jun 25, 2024
Viaarxiv icon

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

Add code
Jun 03, 2024
Viaarxiv icon

AudioLCM: Text-to-Audio Generation with Latent Consistency Models

Add code
Jun 01, 2024
Viaarxiv icon

FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion

Add code
May 10, 2024
Viaarxiv icon

Molecule-Space: Free Lunch in Unified Multimodal Space via Knowledge Fusion

Add code
May 08, 2024
Viaarxiv icon

Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment

Add code
Apr 16, 2024
Viaarxiv icon