Picture for Yanhua Long

Yanhua Long

Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR

Add code
Jan 04, 2026
Viaarxiv icon

A Language-Agnostic Hierarchical LoRA-MoE Architecture for CTC-based Multilingual ASR

Add code
Jan 02, 2026
Viaarxiv icon

Lightweight speech enhancement guided target speech extraction in noisy multi-speaker scenarios

Add code
Aug 27, 2025
Figure 1 for Lightweight speech enhancement guided target speech extraction in noisy multi-speaker scenarios
Figure 2 for Lightweight speech enhancement guided target speech extraction in noisy multi-speaker scenarios
Figure 3 for Lightweight speech enhancement guided target speech extraction in noisy multi-speaker scenarios
Figure 4 for Lightweight speech enhancement guided target speech extraction in noisy multi-speaker scenarios
Viaarxiv icon

Unified Architecture and Unsupervised Speech Disentanglement for Speaker Embedding-Free Enrollment in Personalized Speech Enhancement

Add code
May 18, 2025
Figure 1 for Unified Architecture and Unsupervised Speech Disentanglement for Speaker Embedding-Free Enrollment in Personalized Speech Enhancement
Figure 2 for Unified Architecture and Unsupervised Speech Disentanglement for Speaker Embedding-Free Enrollment in Personalized Speech Enhancement
Figure 3 for Unified Architecture and Unsupervised Speech Disentanglement for Speaker Embedding-Free Enrollment in Personalized Speech Enhancement
Figure 4 for Unified Architecture and Unsupervised Speech Disentanglement for Speaker Embedding-Free Enrollment in Personalized Speech Enhancement
Viaarxiv icon

Exploring the Potential of SSL Models for Sound Event Detection

Add code
May 17, 2025
Viaarxiv icon

SEF-PNet: Speaker Encoder-Free Personalized Speech Enhancement with Local and Global Contexts Aggregation

Add code
Jan 20, 2025
Figure 1 for SEF-PNet: Speaker Encoder-Free Personalized Speech Enhancement with Local and Global Contexts Aggregation
Figure 2 for SEF-PNet: Speaker Encoder-Free Personalized Speech Enhancement with Local and Global Contexts Aggregation
Figure 3 for SEF-PNet: Speaker Encoder-Free Personalized Speech Enhancement with Local and Global Contexts Aggregation
Figure 4 for SEF-PNet: Speaker Encoder-Free Personalized Speech Enhancement with Local and Global Contexts Aggregation
Viaarxiv icon

ICSD: An Open-source Dataset for Infant Cry and Snoring Detection

Add code
Aug 20, 2024
Figure 1 for ICSD: An Open-source Dataset for Infant Cry and Snoring Detection
Figure 2 for ICSD: An Open-source Dataset for Infant Cry and Snoring Detection
Figure 3 for ICSD: An Open-source Dataset for Infant Cry and Snoring Detection
Figure 4 for ICSD: An Open-source Dataset for Infant Cry and Snoring Detection
Viaarxiv icon

Autoencoder with Group-based Decoder and Multi-task Optimization for Anomalous Sound Detection

Add code
Nov 15, 2023
Figure 1 for Autoencoder with Group-based Decoder and Multi-task Optimization for Anomalous Sound Detection
Figure 2 for Autoencoder with Group-based Decoder and Multi-task Optimization for Anomalous Sound Detection
Figure 3 for Autoencoder with Group-based Decoder and Multi-task Optimization for Anomalous Sound Detection
Viaarxiv icon

UNISOUND System for VoxCeleb Speaker Recognition Challenge 2023

Add code
Aug 24, 2023
Figure 1 for UNISOUND System for VoxCeleb Speaker Recognition Challenge 2023
Figure 2 for UNISOUND System for VoxCeleb Speaker Recognition Challenge 2023
Figure 3 for UNISOUND System for VoxCeleb Speaker Recognition Challenge 2023
Figure 4 for UNISOUND System for VoxCeleb Speaker Recognition Challenge 2023
Viaarxiv icon

Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition

Add code
Jun 20, 2023
Figure 1 for Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition
Figure 2 for Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition
Figure 3 for Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition
Figure 4 for Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition
Viaarxiv icon