Picture for Zhizheng Wu

Zhizheng Wu

VoxSafeBench: Not Just What Is Said, but Who, How, and Where

Add code
Apr 16, 2026
Viaarxiv icon

MimicLM: Zero-Shot Voice Imitation through Autoregressive Modeling of Pseudo-Parallel Speech Corpora

Add code
Apr 13, 2026
Viaarxiv icon

Grounding Sim-to-Real Generalization in Dexterous Manipulation: An Empirical Study with Vision-Language-Action Models

Add code
Mar 24, 2026
Viaarxiv icon

NV-Bench: Benchmark of Nonverbal Vocalization Synthesis for Expressive Text-to-Speech Generation

Add code
Mar 16, 2026
Viaarxiv icon

WhispEar: A Bi-directional Framework for Scaling Whispered Speech Conversion via Pseudo-Parallel Whisper Generation

Add code
Mar 09, 2026
Viaarxiv icon

Anatomy of the Modality Gap: Dissecting the Internal States of End-to-End Speech LLMs

Add code
Mar 02, 2026
Viaarxiv icon

VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models

Add code
Jan 27, 2026
Viaarxiv icon

Linear Script Representations in Speech Foundation Models Enable Zero-Shot Transliteration

Add code
Jan 06, 2026
Viaarxiv icon

Aliasing-Free Neural Audio Synthesis

Add code
Dec 23, 2025
Figure 1 for Aliasing-Free Neural Audio Synthesis
Figure 2 for Aliasing-Free Neural Audio Synthesis
Figure 3 for Aliasing-Free Neural Audio Synthesis
Figure 4 for Aliasing-Free Neural Audio Synthesis
Viaarxiv icon

SpeechJudge: Towards Human-Level Judgment for Speech Naturalness

Add code
Nov 11, 2025
Figure 1 for SpeechJudge: Towards Human-Level Judgment for Speech Naturalness
Figure 2 for SpeechJudge: Towards Human-Level Judgment for Speech Naturalness
Figure 3 for SpeechJudge: Towards Human-Level Judgment for Speech Naturalness
Figure 4 for SpeechJudge: Towards Human-Level Judgment for Speech Naturalness
Viaarxiv icon