Abstract:Deep Neural Networks (DNNs) often struggle to suppress noise at low signal-to-noise ratios (SNRs). This paper addresses speech enhancement in scenarios dominated by harmonic noise and proposes a framework that integrates cyclostationarity-aware preprocessing with lightweight DNN-based denoising. A cyclic minimum power distortionless response (cMPDR) spectral beamformer is used as a preprocessing block. It exploits the spectral correlations of cyclostationary noise to suppress harmonic components prior to learning-based enhancement and does not require modifications to the DNN architecture. The proposed pipeline is evaluated in a single-channel setting using two DNN architectures: a simple and lightweight convolutional recurrent neural network (CRNN), and a state-of-the-art model, namely ultra-low complexity network (ULCNet). Experiments on synthetic data and real-world recordings dominated by rotating machinery noise demonstrate consistent improvements over end-to-end DNN baselines, particularly at low SNRs. Remarkably, a parameter-efficient CRNN with cMPDR preprocessing surpasses the performance of the larger ULCNet operating on raw or Wiener-filtered inputs. These results indicate that explicitly incorporating cyclostationarity as a signal prior is more effective than increasing model capacity alone for suppressing harmonic interference.




Abstract:This article focuses on estimating relative transfer functions (RTFs) for beamforming applications. While traditional methods assume that spectra are uncorrelated, this assumption is often violated in practical scenarios due to natural phenomena such as the Doppler effect, artificial manipulations like time-domain windowing, or the non-stationary nature of the signals, as observed in speech. To address this, we propose an RTF estimation technique that leverages spectral and spatial correlations through subspace analysis. To overcome the challenge of estimating second-order spectral statistics for real data, we employ a phase-adjusted estimator originally proposed in the context of engine fault detection. Additionally, we derive Cram\'er--Rao bounds (CRBs) for the RTF estimation task, providing theoretical insights into the achievable estimation accuracy. The bounds show that channel estimation can be performed more accurately if the noise or the target presents spectral correlations. Experiments on real and synthetic data show that our technique outperforms the narrowband maximum-likelihood estimator when the target exhibits spectral correlations. Although the accuracy of the proposed algorithm is generally close to the bound, there is some room for improvement, especially when noise signals with high spectral correlation are present. While the applications of channel estimation are diverse, we demonstrate the method in the context of array processing for speech.