Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A correlation-permutation approach for speech-music encoders model merging

Jun 13, 2025

Fabian Ritter-Gutierrez, Yi-Cheng Lin, Jeremy H. M Wong, Hung-yi Lee, Eng Siong Chng, Nancy F. Chen

Share this with someone who'll enjoy it:

Abstract:Creating a unified speech and music model requires expensive pre-training. Model merging can instead create an unified audio model with minimal computational expense. However, direct merging is challenging when the models are not aligned in the weight space. Motivated by Git Re-Basin, we introduce a correlation-permutation approach that aligns a music encoder's internal layers with a speech encoder. We extend previous work to the case of merging transformer layers. The method computes a permutation matrix that maximizes the model's features-wise cross-correlations layer by layer, enabling effective fusion of these otherwise disjoint models. The merged model retains speech capabilities through this method while significantly enhancing music performance, achieving an improvement of 14.83 points in average score compared to linear interpolation model merging. This work allows the creation of unified audio models from independently trained encoders.

* Under review

View paper on

Share this with someone who'll enjoy it:

Title:A correlation-permutation approach for speech-music encoders model merging

Paper and Code