Picture for Zexu Pan

Zexu Pan

Beyond Lips: Integrating Gesture and Lip Cues for Robust Audio-visual Speaker Extraction

Add code
Jan 27, 2026
Viaarxiv icon

LuSeeL: Language-queried Binaural Universal Sound Event Extraction and Localization

Add code
Jan 27, 2026
Viaarxiv icon

FlowSE-GRPO: Training Flow Matching Speech Enhancement via Online Reinforcement Learning

Add code
Jan 23, 2026
Viaarxiv icon

FunAudio-ASR Technical Report

Add code
Sep 15, 2025
Figure 1 for FunAudio-ASR Technical Report
Figure 2 for FunAudio-ASR Technical Report
Figure 3 for FunAudio-ASR Technical Report
Figure 4 for FunAudio-ASR Technical Report
Viaarxiv icon

ClearerVoice-Studio: Bridging Advanced Speech Processing Research and Practical Deployment

Add code
Jun 24, 2025
Viaarxiv icon

Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction

Add code
May 27, 2025
Viaarxiv icon

Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation

Add code
Apr 03, 2025
Figure 1 for Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation
Figure 2 for Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation
Figure 3 for Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation
Figure 4 for Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation
Viaarxiv icon

Context-Aware Two-Step Training Scheme for Domain Invariant Speech Separation

Add code
Mar 16, 2025
Viaarxiv icon

Conditional Latent Diffusion-Based Speech Enhancement Via Dual Context Learning

Add code
Jan 17, 2025
Viaarxiv icon

HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution

Add code
Jan 17, 2025
Figure 1 for HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution
Figure 2 for HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution
Figure 3 for HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution
Figure 4 for HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution
Viaarxiv icon