Picture for Pengyuan Zhang

Pengyuan Zhang

Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation

Add code
Jul 07, 2024
Viaarxiv icon

TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition

Add code
Apr 19, 2024
Figure 1 for TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition
Figure 2 for TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition
Figure 3 for TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition
Figure 4 for TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition
Viaarxiv icon

Modality-Collaborative Transformer with Hybrid Feature Reconstruction for Robust Emotion Recognition

Add code
Dec 26, 2023
Figure 1 for Modality-Collaborative Transformer with Hybrid Feature Reconstruction for Robust Emotion Recognition
Figure 2 for Modality-Collaborative Transformer with Hybrid Feature Reconstruction for Robust Emotion Recognition
Figure 3 for Modality-Collaborative Transformer with Hybrid Feature Reconstruction for Robust Emotion Recognition
Figure 4 for Modality-Collaborative Transformer with Hybrid Feature Reconstruction for Robust Emotion Recognition
Viaarxiv icon

DSNet: Disentangled Siamese Network with Neutral Calibration for Speech Emotion Recognition

Add code
Dec 25, 2023
Figure 1 for DSNet: Disentangled Siamese Network with Neutral Calibration for Speech Emotion Recognition
Figure 2 for DSNet: Disentangled Siamese Network with Neutral Calibration for Speech Emotion Recognition
Figure 3 for DSNet: Disentangled Siamese Network with Neutral Calibration for Speech Emotion Recognition
Figure 4 for DSNet: Disentangled Siamese Network with Neutral Calibration for Speech Emotion Recognition
Viaarxiv icon

Enhancing Spoofing Speech Detection Using Rhythm Information

Add code
Oct 18, 2023
Figure 1 for Enhancing Spoofing Speech Detection Using Rhythm Information
Figure 2 for Enhancing Spoofing Speech Detection Using Rhythm Information
Figure 3 for Enhancing Spoofing Speech Detection Using Rhythm Information
Figure 4 for Enhancing Spoofing Speech Detection Using Rhythm Information
Viaarxiv icon

Synthetic Speech Detection Based on Temporal Consistency and Distribution of Speaker Features

Add code
Sep 29, 2023
Figure 1 for Synthetic Speech Detection Based on Temporal Consistency and Distribution of Speaker Features
Figure 2 for Synthetic Speech Detection Based on Temporal Consistency and Distribution of Speaker Features
Figure 3 for Synthetic Speech Detection Based on Temporal Consistency and Distribution of Speaker Features
Figure 4 for Synthetic Speech Detection Based on Temporal Consistency and Distribution of Speaker Features
Viaarxiv icon

The Impact of Silence on Speech Anti-Spoofing

Add code
Sep 21, 2023
Figure 1 for The Impact of Silence on Speech Anti-Spoofing
Figure 2 for The Impact of Silence on Speech Anti-Spoofing
Figure 3 for The Impact of Silence on Speech Anti-Spoofing
Figure 4 for The Impact of Silence on Speech Anti-Spoofing
Viaarxiv icon

One-Class Knowledge Distillation for Spoofing Speech Detection

Add code
Sep 15, 2023
Figure 1 for One-Class Knowledge Distillation for Spoofing Speech Detection
Figure 2 for One-Class Knowledge Distillation for Spoofing Speech Detection
Figure 3 for One-Class Knowledge Distillation for Spoofing Speech Detection
Viaarxiv icon

Improving Short Utterance Anti-Spoofing with AASIST2

Add code
Sep 15, 2023
Figure 1 for Improving Short Utterance Anti-Spoofing with AASIST2
Figure 2 for Improving Short Utterance Anti-Spoofing with AASIST2
Figure 3 for Improving Short Utterance Anti-Spoofing with AASIST2
Figure 4 for Improving Short Utterance Anti-Spoofing with AASIST2
Viaarxiv icon

Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder

Add code
Sep 02, 2023
Figure 1 for Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder
Figure 2 for Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder
Figure 3 for Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder
Figure 4 for Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder
Viaarxiv icon