Picture for Shilei Zhang

Shilei Zhang

SqueezeComposer: Temporal Speed-up is A Simple Trick for Long-form Music Composing

Add code
Mar 22, 2026
Viaarxiv icon

B-GRPO: Unsupervised Speech Emotion Recognition based on Batched-Group Relative Policy Optimization

Add code
Feb 06, 2026
Viaarxiv icon

OneVoice: One Model, Triple Scenarios-Towards Unified Zero-shot Voice Conversion

Add code
Jan 26, 2026
Viaarxiv icon

DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles

Add code
Dec 04, 2024
Figure 1 for DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
Figure 2 for DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
Figure 3 for DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
Figure 4 for DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
Viaarxiv icon

VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark

Add code
Jul 16, 2024
Viaarxiv icon

On Calibration of Speech Classification Models: Insights from Energy-Based Model Investigations

Add code
Jun 26, 2024
Figure 1 for On Calibration of Speech Classification Models: Insights from Energy-Based Model Investigations
Figure 2 for On Calibration of Speech Classification Models: Insights from Energy-Based Model Investigations
Figure 3 for On Calibration of Speech Classification Models: Insights from Energy-Based Model Investigations
Figure 4 for On Calibration of Speech Classification Models: Insights from Energy-Based Model Investigations
Viaarxiv icon

Exploring Energy-Based Models for Out-of-Distribution Detection in Dialect Identification

Add code
Jun 26, 2024
Figure 1 for Exploring Energy-Based Models for Out-of-Distribution Detection in Dialect Identification
Figure 2 for Exploring Energy-Based Models for Out-of-Distribution Detection in Dialect Identification
Figure 3 for Exploring Energy-Based Models for Out-of-Distribution Detection in Dialect Identification
Figure 4 for Exploring Energy-Based Models for Out-of-Distribution Detection in Dialect Identification
Viaarxiv icon

CEC: A Noisy Label Detection Method for Speaker Recognition

Add code
Jun 19, 2024
Figure 1 for CEC: A Noisy Label Detection Method for Speaker Recognition
Figure 2 for CEC: A Noisy Label Detection Method for Speaker Recognition
Figure 3 for CEC: A Noisy Label Detection Method for Speaker Recognition
Figure 4 for CEC: A Noisy Label Detection Method for Speaker Recognition
Viaarxiv icon

PolySpeech: Exploring Unified Multitask Speech Models for Competitiveness with Single-task Models

Add code
Jun 12, 2024
Viaarxiv icon

Plugin Speech Enhancement: A Universal Speech Enhancement Framework Inspired by Dynamic Neural Network

Add code
Feb 20, 2024
Viaarxiv icon