Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaobin Rong

A Lightweight Hybrid Dual Channel Speech Enhancement System under Low-SNR Conditions

May 26, 2025

Zheng Wang, Xiaobin Rong, Yu Sun, Tianchi Sun, Zhibin Lin, Jing Lu

Figure 1 for A Lightweight Hybrid Dual Channel Speech Enhancement System under Low-SNR Conditions

Figure 2 for A Lightweight Hybrid Dual Channel Speech Enhancement System under Low-SNR Conditions

Figure 3 for A Lightweight Hybrid Dual Channel Speech Enhancement System under Low-SNR Conditions

Figure 4 for A Lightweight Hybrid Dual Channel Speech Enhancement System under Low-SNR Conditions

Abstract:Although deep learning based multi-channel speech enhancement has achieved significant advancements, its practical deployment is often limited by constrained computational resources, particularly in low signal-to-noise ratio (SNR) conditions. In this paper, we propose a lightweight hybrid dual-channel speech enhancement system that combines independent vector analysis (IVA) with a modified version of the dual-channel grouped temporal convolutional recurrent network (GTCRN). IVA functions as a coarse estimator, providing auxiliary information for both speech and noise, while the modified GTCRN further refines the speech quality. We investigate several modifications to ensure the comprehensive utilization of both original and auxiliary information. Experimental results demonstrate the effectiveness of the proposed system, achieving enhanced speech with minimal parameters and low computational complexity.

* Accepted by Interspeech 2025

Via

Access Paper or Ask Questions

TS-URGENet: A Three-stage Universal Robust and Generalizable Speech Enhancement Network

May 24, 2025

Xiaobin Rong, Dahan Wang, Qinwen Hu, Yushi Wang, Yuxiang Hu, Jing Lu

Abstract:Universal speech enhancement aims to handle input speech with different distortions and input formats. To tackle this challenge, we present TS-URGENet, a Three-Stage Universal, Robust, and Generalizable speech Enhancement Network. To address various distortions, the proposed system employs a novel three-stage architecture consisting of a filling stage, a separation stage, and a restoration stage. The filling stage mitigates packet loss by preliminarily filling lost regions under noise interference, ensuring signal continuity. The separation stage suppresses noise, reverberation, and clipping distortion to improve speech clarity. Finally, the restoration stage compensates for bandwidth limitation, codec artifacts, and residual packet loss distortion, refining the overall speech quality. Our proposed TS-URGENet achieved outstanding performance in the Interspeech 2025 URGENT Challenge, ranking 2nd in Track 1.

* Accepted by Interspeech 2025

Via

Access Paper or Ask Questions

Adaptive Convolution for CNN-based Speech Enhancement Models

Feb 20, 2025

Dahan Wang, Xiaobin Rong, Shiruo Sun, Yuxiang Hu, Changbao Zhu, Jing Lu

Abstract:Deep learning-based speech enhancement methods have significantly improved speech quality and intelligibility. Convolutional neural networks (CNNs) have been proven to be essential components of many high-performance models. In this paper, we introduce adaptive convolution, an efficient and versatile convolutional module that enhances the model's capability to adaptively represent speech signals. Adaptive convolution performs frame-wise causal dynamic convolution, generating time-varying kernels for each frame by assembling multiple parallel candidate kernels. A Lightweight attention mechanism leverages both current and historical information to assign adaptive weights to each candidate kernel, guiding their aggregation. This enables the convolution operation to adapt to frame-level speech spectral features, leading to more efficient extraction and reconstruction. Experimental results on various CNN-based models demonstrate that adaptive convolution significantly improves the performance with negligible increases in computational complexity, especially for lightweight models. Furthermore, we propose the adaptive convolutional recurrent network (AdaptCRN), an ultra-lightweight model that incorporates adaptive convolution and an efficient encoder-decoder design, achieving superior performance compared to models with similar or even higher computational costs.

* Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing

Via

Access Paper or Ask Questions