Picture for Dongmei Wang

Dongmei Wang

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

Add code
May 28, 2024
Viaarxiv icon

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

Add code
Apr 10, 2024
Viaarxiv icon

Profile-Error-Tolerant Target-Speaker Voice Activity Detection

Add code
Sep 21, 2023
Figure 1 for Profile-Error-Tolerant Target-Speaker Voice Activity Detection
Figure 2 for Profile-Error-Tolerant Target-Speaker Voice Activity Detection
Figure 3 for Profile-Error-Tolerant Target-Speaker Voice Activity Detection
Figure 4 for Profile-Error-Tolerant Target-Speaker Voice Activity Detection
Viaarxiv icon

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers

Add code
May 30, 2023
Figure 1 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Figure 2 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Figure 3 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Figure 4 for Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Viaarxiv icon

Target Sound Extraction with Variable Cross-modality Clues

Add code
Mar 15, 2023
Figure 1 for Target Sound Extraction with Variable Cross-modality Clues
Figure 2 for Target Sound Extraction with Variable Cross-modality Clues
Figure 3 for Target Sound Extraction with Variable Cross-modality Clues
Figure 4 for Target Sound Extraction with Variable Cross-modality Clues
Viaarxiv icon

Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization

Add code
Aug 27, 2022
Figure 1 for Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Figure 2 for Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Figure 3 for Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Figure 4 for Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Viaarxiv icon

Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation

Add code
Apr 07, 2022
Figure 1 for Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation
Figure 2 for Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation
Figure 3 for Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation
Figure 4 for Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation
Viaarxiv icon

PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays

Add code
Jan 24, 2022
Figure 1 for PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays
Figure 2 for PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays
Figure 3 for PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays
Figure 4 for PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays
Viaarxiv icon

VarArray: Array-Geometry-Agnostic Continuous Speech Separation

Add code
Oct 26, 2021
Figure 1 for VarArray: Array-Geometry-Agnostic Continuous Speech Separation
Figure 2 for VarArray: Array-Geometry-Agnostic Continuous Speech Separation
Figure 3 for VarArray: Array-Geometry-Agnostic Continuous Speech Separation
Viaarxiv icon

All-neural beamformer for continuous speech separation

Add code
Oct 13, 2021
Figure 1 for All-neural beamformer for continuous speech separation
Figure 2 for All-neural beamformer for continuous speech separation
Figure 3 for All-neural beamformer for continuous speech separation
Figure 4 for All-neural beamformer for continuous speech separation
Viaarxiv icon