Picture for Zhizheng Wu

Zhizheng Wu

AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models

Add code
May 22, 2025
Viaarxiv icon

Neurodyne: Neural Pitch Manipulation with Representation Learning and Cycle-Consistency GAN

Add code
May 21, 2025
Viaarxiv icon

DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation

Add code
May 19, 2025
Viaarxiv icon

SingNet: Towards a Large-Scale, Diverse, and In-the-Wild Singing Voice Dataset

Add code
May 14, 2025
Viaarxiv icon

Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment

Add code
May 07, 2025
Viaarxiv icon

Diff-SSL-G-Comp: Towards a Large-Scale and Diverse Dataset for Virtual Analog Modeling

Add code
Apr 06, 2025
Viaarxiv icon

Solla: Towards a Speech-Oriented LLM That Hears Acoustic Context

Add code
Mar 19, 2025
Viaarxiv icon

Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement

Add code
Feb 11, 2025
Viaarxiv icon

Metis: A Foundation Speech Generation Model with Masked Generative Pre-training

Add code
Feb 05, 2025
Viaarxiv icon

Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation

Add code
Jan 27, 2025
Viaarxiv icon