Picture for Haoxu Wang

Haoxu Wang

FLASepformer: Efficient Speech Separation with Gated Focused Linear Attention Transformer

Add code
Aug 27, 2025
Viaarxiv icon

Exploring Efficient Directional and Distance Cues for Regional Speech Separation

Add code
Aug 11, 2025
Viaarxiv icon

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

Add code
May 16, 2025
Viaarxiv icon

ZipEnhancer: Dual-Path Down-Up Sampling-based Zipformer for Monaural Speech Enhancement

Add code
Jan 09, 2025
Figure 1 for ZipEnhancer: Dual-Path Down-Up Sampling-based Zipformer for Monaural Speech Enhancement
Figure 2 for ZipEnhancer: Dual-Path Down-Up Sampling-based Zipformer for Monaural Speech Enhancement
Figure 3 for ZipEnhancer: Dual-Path Down-Up Sampling-based Zipformer for Monaural Speech Enhancement
Figure 4 for ZipEnhancer: Dual-Path Down-Up Sampling-based Zipformer for Monaural Speech Enhancement
Viaarxiv icon

Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices

Add code
Mar 19, 2024
Figure 1 for Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices
Figure 2 for Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices
Figure 3 for Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices
Figure 4 for Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices
Viaarxiv icon

Robust Wake Word Spotting With Frame-Level Cross-Modal Attention Based Audio-Visual Conformer

Add code
Mar 04, 2024
Viaarxiv icon

LCB-net: Long-Context Biasing for Audio-Visual Speech Recognition

Add code
Jan 12, 2024
Viaarxiv icon

Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition

Add code
Dec 14, 2023
Figure 1 for Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition
Figure 2 for Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition
Figure 3 for Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition
Figure 4 for Hourglass-AVSR: Down-Up Sampling-based Computational Efficiency Model for Audio-Visual Speech Recognition
Viaarxiv icon

SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus

Add code
Sep 12, 2023
Viaarxiv icon

The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis

Add code
Mar 04, 2023
Figure 1 for The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis
Figure 2 for The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis
Figure 3 for The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis
Figure 4 for The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis
Viaarxiv icon