Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jihui Zhang

BiFormer3D: Grid-Free Time-Domain Reconstruction of Head-Related Impulse Responses with a Spatially Encoded Transformer

Mar 30, 2026

Shaoheng Xu, Chunyi Sun, Jihui Zhang, Amy Bastine, Prasanga N. Samarasinghe, Thushara D. Abhayapala, Hongdong Li

Abstract:Individualized head-related impulse responses (HRIRs) enable binaural rendering, but dense per-listener measurements are costly. We address HRIR spatial up-sampling from sparse per-listener measurements: given a few measured HRIRs for a listener, predict HRIRs at unmeasured target directions. Prior learning methods often work in the frequency domain, rely on minimum-phase assumptions or separate timing models, and use a fixed direction grid, which can degrade temporal fidelity and spatial continuity. We propose BiFormer3D, a time-domain, grid-free binaural Transformer for reconstructing HRIRs at arbitrary directions from sparse inputs. It uses sinusoidal spatial features, a Conv1D refinement module, and auxiliary interaural time difference (ITD) and interaural level difference (ILD) heads. On SONICOM, it improves normalized mean squared error (NMSE), cosine distance, and ITD/ILD errors over prior methods; ablations validate modules and show minimum-phase pre-processing is unnecessary.

* The paper was submitted for review to Interspeech 2026

Via

Access Paper or Ask Questions

RIR-Former: Coordinate-Guided Transformer for Continuous Reconstruction of Room Impulse Responses

Feb 03, 2026

Shaoheng Xu, Chunyi Sun, Jihui Zhang, Prasanga N. Samarasinghe, Thushara D. Abhayapala

Abstract:Room impulse responses (RIRs) are essential for many acoustic signal processing tasks, yet measuring them densely across space is often impractical. In this work, we propose RIR-Former, a grid-free, one-step feed-forward model for RIR reconstruction. By introducing a sinusoidal encoding module into a transformer backbone, our method effectively incorporates microphone position information, enabling interpolation at arbitrary array locations. Furthermore, a segmented multi-branch decoder is designed to separately handle early reflections and late reverberation, improving reconstruction across the entire RIR. Experiments on diverse simulated acoustic environments demonstrate that RIR-Former consistently outperforms state-of-the-art baselines in terms of normalized mean square error (NMSE) and cosine distance (CD), under varying missing rates and array configurations. These results highlight the potential of our approach for practical deployment and motivate future work on scaling from randomly spaced linear arrays to complex array geometries, dynamic acoustic scenes, and real-world environments.

* Accepted to International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2026. Equal contribution: Shaoheng Xu and Chunyi Sun

Via

Access Paper or Ask Questions

GMM based multi-stage Wiener filtering for low SNR speech enhancement

Jun 19, 2022

Wageesha Manamperi, Prasanga N. Samarasinghe, Thushara D. Abhayapala, Jihui Zhang

Figure 1 for GMM based multi-stage Wiener filtering for low SNR speech enhancement

Figure 2 for GMM based multi-stage Wiener filtering for low SNR speech enhancement

Figure 3 for GMM based multi-stage Wiener filtering for low SNR speech enhancement

Abstract:This paper proposes a single-channel speech enhancement method to reduce the noise and enhance speech at low signal-to-noise ratio (SNR) levels and non-stationary noise conditions. Specifically, we focus on modeling the noise using a Gaussian mixture model (GMM) based on a multi-stage process with a parametric Wiener filter. The proposed noise model estimates a more accurate noise power spectral density (PSD), and allows for better generalization under various noise conditions compared to traditional Wiener filtering methods. Simulations show that the proposed approach can achieve better performance in terms of speech quality (PESQ) and intelligibility (STOI) at low SNR levels.

* 5 pages, 3 figures, submitted to a conference

Via

Access Paper or Ask Questions