Picture for Pengyuan Zhang

Pengyuan Zhang

Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation

Add code
Jan 27, 2025
Viaarxiv icon

Controlling your Attributes in Voice

Add code
Jan 03, 2025
Viaarxiv icon

SLIDE: Integrating Speech Language Model with LLM for Spontaneous Spoken Dialogue Generation

Add code
Jan 01, 2025
Figure 1 for SLIDE: Integrating Speech Language Model with LLM for Spontaneous Spoken Dialogue Generation
Figure 2 for SLIDE: Integrating Speech Language Model with LLM for Spontaneous Spoken Dialogue Generation
Figure 3 for SLIDE: Integrating Speech Language Model with LLM for Spontaneous Spoken Dialogue Generation
Figure 4 for SLIDE: Integrating Speech Language Model with LLM for Spontaneous Spoken Dialogue Generation
Viaarxiv icon

Transliterated Zero-Shot Domain Adaptation for Automatic Speech Recognition

Add code
Dec 15, 2024
Viaarxiv icon

SF-Speech: Straightened Flow for Zero-Shot Voice Clone on Small-Scale Dataset

Add code
Oct 16, 2024
Figure 1 for SF-Speech: Straightened Flow for Zero-Shot Voice Clone on Small-Scale Dataset
Figure 2 for SF-Speech: Straightened Flow for Zero-Shot Voice Clone on Small-Scale Dataset
Figure 3 for SF-Speech: Straightened Flow for Zero-Shot Voice Clone on Small-Scale Dataset
Figure 4 for SF-Speech: Straightened Flow for Zero-Shot Voice Clone on Small-Scale Dataset
Viaarxiv icon

Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation

Add code
Jul 07, 2024
Figure 1 for Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
Figure 2 for Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
Figure 3 for Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
Figure 4 for Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
Viaarxiv icon

TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition

Add code
Apr 19, 2024
Figure 1 for TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition
Figure 2 for TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition
Figure 3 for TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition
Figure 4 for TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition
Viaarxiv icon

Modality-Collaborative Transformer with Hybrid Feature Reconstruction for Robust Emotion Recognition

Add code
Dec 26, 2023
Viaarxiv icon

DSNet: Disentangled Siamese Network with Neutral Calibration for Speech Emotion Recognition

Add code
Dec 25, 2023
Viaarxiv icon

Enhancing Spoofing Speech Detection Using Rhythm Information

Add code
Oct 18, 2023
Figure 1 for Enhancing Spoofing Speech Detection Using Rhythm Information
Figure 2 for Enhancing Spoofing Speech Detection Using Rhythm Information
Figure 3 for Enhancing Spoofing Speech Detection Using Rhythm Information
Figure 4 for Enhancing Spoofing Speech Detection Using Rhythm Information
Viaarxiv icon