Picture for Longbiao Wang

Longbiao Wang

ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations

Add code
Dec 22, 2023
Figure 1 for ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Figure 2 for ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Figure 3 for ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Figure 4 for ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Viaarxiv icon

Multi-Level Knowledge Distillation for Speech Emotion Recognition in Noisy Conditions

Add code
Dec 21, 2023
Figure 1 for Multi-Level Knowledge Distillation for Speech Emotion Recognition in Noisy Conditions
Figure 2 for Multi-Level Knowledge Distillation for Speech Emotion Recognition in Noisy Conditions
Figure 3 for Multi-Level Knowledge Distillation for Speech Emotion Recognition in Noisy Conditions
Figure 4 for Multi-Level Knowledge Distillation for Speech Emotion Recognition in Noisy Conditions
Viaarxiv icon

High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models

Add code
Sep 27, 2023
Viaarxiv icon

Learning Speech Representation From Contrastive Token-Acoustic Pretraining

Add code
Sep 06, 2023
Viaarxiv icon

Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding

Add code
Jul 28, 2023
Viaarxiv icon

Rethinking the visual cues in audio-visual speaker extraction

Add code
Jun 05, 2023
Viaarxiv icon

speech and noise dual-stream spectrogram refine network with speech distortion loss for robust speech recognition

Add code
May 30, 2023
Viaarxiv icon

Locate and Beamform: Two-dimensional Locating All-neural Beamformer for Multi-channel Speech Separation

Add code
May 18, 2023
Viaarxiv icon

Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder and Decoder

Add code
Mar 26, 2023
Figure 1 for Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder and Decoder
Figure 2 for Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder and Decoder
Figure 3 for Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder and Decoder
Figure 4 for Time-domain Speech Enhancement Assisted by Multi-resolution Frequency Encoder and Decoder
Viaarxiv icon

Cross-modal Audio-visual Co-learning for Text-independent Speaker Verification

Add code
Feb 22, 2023
Figure 1 for Cross-modal Audio-visual Co-learning for Text-independent Speaker Verification
Figure 2 for Cross-modal Audio-visual Co-learning for Text-independent Speaker Verification
Figure 3 for Cross-modal Audio-visual Co-learning for Text-independent Speaker Verification
Figure 4 for Cross-modal Audio-visual Co-learning for Text-independent Speaker Verification
Viaarxiv icon