Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Boxuan Cao

Unveiling the Best Practices for Applying Speech Foundation Models to Speech Intelligibility Prediction for Hearing-Impaired People

May 13, 2025

Haoshuai Zhou, Boxuan Cao, Changgeng Mo, Linkai Li, Shan Xiang Wang

Abstract:Speech foundation models (SFMs) have demonstrated strong performance across a variety of downstream tasks, including speech intelligibility prediction for hearing-impaired people (SIP-HI). However, optimizing SFMs for SIP-HI has been insufficiently explored. In this paper, we conduct a comprehensive study to identify key design factors affecting SIP-HI performance with 5 SFMs, focusing on encoder layer selection, prediction head architecture, and ensemble configurations. Our findings show that, contrary to traditional use-all-layers methods, selecting a single encoder layer yields better results. Additionally, temporal modeling is crucial for effective prediction heads. We also demonstrate that ensembling multiple SFMs improves performance, with stronger individual models providing greater benefit. Finally, we explore the relationship between key SFM attributes and their impact on SIP-HI performance. Our study offers practical insights into effectively adapting SFMs for speech intelligibility prediction for hearing-impaired populations.

Via

Access Paper or Ask Questions

Hyperspectral Image Restoration and Super-resolution with Physics-Aware Deep Learning for Biomedical Applications

Mar 03, 2025

Yuchen Xiang, Zhaolu Liu, Monica Emili Garcia-Segura, Daniel Simon, Boxuan Cao, Vincen Wu, Kenneth Robinson, Yu Wang, Ronan Battle, Robert T. Murray(+3 more)

Abstract:Hyperspectral imaging is a powerful bioimaging tool which can uncover novel insights, thanks to its sensitivity to the intrinsic properties of materials. However, this enhanced contrast comes at the cost of system complexity, constrained by an inherent trade-off between spatial resolution, spectral resolution, and imaging speed. To overcome this limitation, we present a deep learning-based approach that restores and enhances pixel resolution post-acquisition without any a priori knowledge. Fine-tuned using metrics aligned with the imaging model, our physics-aware method achieves a 16X pixel super-resolution enhancement and a 12X imaging speedup without the need of additional training data for transfer learning. Applied to both synthetic and experimental data from five different sample types, we demonstrate that the model preserves biological integrity, ensuring no features are lost or hallucinated. We also concretely demonstrate the model's ability to reveal disease-associated metabolic changes in Downs syndrome that would otherwise remain undetectable. Furthermore, we provide physical insights into the inner workings of the model, paving the way for future refinements that could potentially surpass instrumental limits in an explainable manner. All methods are available as open-source software on GitHub.

Via

Access Paper or Ask Questions