Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jangyeon Kim

Deep Filter Estimation from Inter-Frame Correlations for Monaural Speech Dereverberation

Mar 16, 2026

Ui-Hyeop Shin, Jun Hyung Kim, Jangyeon Kim, Wooseok Kim, Hyung-Min Park

Abstract:Speech dereverberation in distant-microphone scenarios remains challenging due to the high correlation between reverberation and target signals, often leading to poor generalization in real-world environments. We propose IF-CorrNet, a correlation-to-filter architecture designed for robustness against acoustic variability. Unlike conventional black-box mapping methods that directly estimate complex spectra, IF-CorrNet explicitly exploits inter-frame STFT correlations to estimate multi-frame deep filters for each time-frequency bin. By shifting the learning objective from direct mapping to filter estimation, the network effectively constrains the solution space, which simplifies the training process and mitigates overfitting to synthetic data. Experimental results on the REVERB Challenge dataset demonstrate that IF-CorrNet achieves a substantial gain in the SRMR metric on RealData, confirming its robustness in suppressing reverberation and noise in practical, non-synthetic environments.

* Submitted for review to Interspeech

Via

Access Paper or Ask Questions

Stack Less, Repeat More: A Block Reusing Approach for Progressive Speech Enhancement

May 26, 2025

Jangyeon Kim, Ui-Hyeop Shin, Jaehyun Ko, Hyung-Min Park

Figure 1 for Stack Less, Repeat More: A Block Reusing Approach for Progressive Speech Enhancement

Figure 2 for Stack Less, Repeat More: A Block Reusing Approach for Progressive Speech Enhancement

Figure 3 for Stack Less, Repeat More: A Block Reusing Approach for Progressive Speech Enhancement

Figure 4 for Stack Less, Repeat More: A Block Reusing Approach for Progressive Speech Enhancement

Abstract:This paper presents an efficient speech enhancement (SE) approach that reuses a processing block repeatedly instead of conventional stacking. Rather than increasing the number of blocks for learning deep latent representations, repeating a single block leads to progressive refinement while reducing parameter redundancy. We also minimize domain transformation by keeping an encoder and decoder shallow and reusing a single sequence modeling block. Experimental results show that the number of processing stages is more critical to performance than the number of blocks with different weights. Also, we observed that the proposed method gradually refines a noisy input within a single block. Furthermore, with the block reuse method, we demonstrate that deepening the encoder and decoder can be redundant for learning deep complex representation. Therefore, the experimental results confirm that the proposed block reusing enables progressive learning and provides an efficient alternative for SE.

* Accepted to Interspeech 2025

Via

Access Paper or Ask Questions