Picture for Yanmin Qian

Yanmin Qian

WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction

Add code
Sep 24, 2024
Figure 1 for WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Figure 2 for WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Figure 3 for WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Figure 4 for WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Viaarxiv icon

Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models

Add code
Sep 11, 2024
Figure 1 for Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models
Figure 2 for Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models
Figure 3 for Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models
Figure 4 for Improving Anomalous Sound Detection via Low-Rank Adaptation Fine-Tuning of Pre-Trained Audio Models
Viaarxiv icon

Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion

Add code
Sep 10, 2024
Viaarxiv icon

Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching

Add code
Sep 07, 2024
Figure 1 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Figure 2 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Figure 3 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Figure 4 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Viaarxiv icon

Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning

Add code
Jul 21, 2024
Figure 1 for Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Figure 2 for Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Figure 3 for Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Figure 4 for Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Viaarxiv icon

Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement

Add code
Jun 19, 2024
Figure 1 for Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement
Figure 2 for Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement
Viaarxiv icon

AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection

Add code
Jun 17, 2024
Figure 1 for AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection
Figure 2 for AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection
Figure 3 for AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection
Figure 4 for AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection
Viaarxiv icon

Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems

Add code
Jun 13, 2024
Figure 1 for Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
Figure 2 for Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
Figure 3 for Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
Figure 4 for Generating Speakers by Prompting Listener Impressions for Pre-trained Multi-Speaker Text-to-Speech Systems
Viaarxiv icon

Target Speech Diarization with Multimodal Prompts

Add code
Jun 11, 2024
Figure 1 for Target Speech Diarization with Multimodal Prompts
Figure 2 for Target Speech Diarization with Multimodal Prompts
Figure 3 for Target Speech Diarization with Multimodal Prompts
Figure 4 for Target Speech Diarization with Multimodal Prompts
Viaarxiv icon

Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization

Add code
Jun 08, 2024
Figure 1 for Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization
Figure 2 for Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization
Figure 3 for Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization
Figure 4 for Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization
Viaarxiv icon