Picture for Xiaofei Li

Xiaofei Li

IPDnet2: an efficient and improved inter-channel phase difference estimation network for sound source localization

Add code
Sep 26, 2025
Viaarxiv icon

Rec-RIR: Monaural Blind Room Impulse Response Identification via DNN-based Reverberant Speech Reconstruction in STFT Domain

Add code
Sep 19, 2025
Viaarxiv icon

UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition

Add code
Sep 18, 2025
Viaarxiv icon

A Composite Predictive-Generative Approach to Monaural Universal Speech Enhancement

Add code
May 30, 2025
Viaarxiv icon

Mel-McNet: A Mel-Scale Framework for Online Multichannel Speech Enhancement

Add code
May 26, 2025
Viaarxiv icon

Bridging VLM and KMP: Enabling Fine-grained robotic manipulation via Semantic Keypoints Representation

Add code
Mar 04, 2025
Viaarxiv icon

CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR

Add code
Feb 27, 2025
Figure 1 for CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR
Figure 2 for CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR
Figure 3 for CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR
Figure 4 for CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR
Viaarxiv icon

VINP: Variational Bayesian Inference with Neural Speech Prior for Joint ASR-Effective Speech Dereverberation and Blind RIR Identification

Add code
Feb 11, 2025
Viaarxiv icon

LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor Extraction

Add code
Oct 09, 2024
Figure 1 for LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor Extraction
Figure 2 for LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor Extraction
Figure 3 for LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor Extraction
Figure 4 for LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor Extraction
Viaarxiv icon

RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization

Add code
Jun 28, 2024
Figure 1 for RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization
Figure 2 for RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization
Figure 3 for RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization
Figure 4 for RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization
Viaarxiv icon