Picture for Helin Wang

Helin Wang

SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline

Add code
May 25, 2025
Viaarxiv icon

Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits

Add code
May 20, 2025
Viaarxiv icon

Audio Large Language Models Can Be Descriptive Speech Quality Evaluators

Add code
Jan 27, 2025
Viaarxiv icon

EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer

Add code
Sep 17, 2024
Viaarxiv icon

SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer

Add code
Sep 12, 2024
Viaarxiv icon

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis

Add code
Sep 11, 2024
Figure 1 for SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Figure 2 for SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Figure 3 for SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Figure 4 for SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Viaarxiv icon

DreamVoice: Text-Guided Voice Conversion

Add code
Jun 24, 2024
Figure 1 for DreamVoice: Text-Guided Voice Conversion
Figure 2 for DreamVoice: Text-Guided Voice Conversion
Figure 3 for DreamVoice: Text-Guided Voice Conversion
Viaarxiv icon

Noise-robust Speech Separation with Fast Generative Correction

Add code
Jun 11, 2024
Viaarxiv icon

Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback

Add code
Jun 02, 2024
Viaarxiv icon

Asynchronous and Segmented Bidirectional Encoding for NMT

Add code
Feb 19, 2024
Viaarxiv icon