Abstract:In this work, we propose a method for computing centroids, or barycenters, in the spectral Wasserstein-2 metric for sets of power spectral densities, where the barycenters are restricted to belong to the set of all-pole spectra with a certain model order. This may be interpreted as finding an autoregressive representative for sets of second-order stationary Gaussian processes. While Wasserstein, or optimal transport, barycenters have been successfully used earlier in problems of spectral estimation and clustering, the resulting barycenters are non-parametric and the complexity of representing and storing them depends on, e.g., the choice of discretization grid. In contrast, the herein proposed method yields compact, low-dimensional, and interpretable spectral centroids that can be used in downstream tasks. Computing the all-pole centroids corresponds to solving a non-convex optimization problem in the model parameters, and we present a gradient descent scheme for addressing this. Although convergence to a globally optimal point cannot be guaranteed, the sub-optimality of the obtained centroids can be quantified. The proposed method is illustrated on a problem of phoneme classification.
Abstract:In this work, we consider the problem of jointly estimating a set of room impulse responses (RIRs) corresponding to closely spaced microphones. The accurate estimation of RIRs is crucial in acoustic applications such as speech enhancement, noise cancellation, and auralization. However, real-world constraints such as short excitation signals, low signal-to-noise ratios, and poor spectral excitation, often render the estimation problem ill-posed. In this paper, we address these challenges by means of optimal mass transport (OMT) regularization. In particular, we propose to use an OMT barycenter, or generalized mean, as a mechanism for information sharing between the microphones. This allows us to quantify and exploit similarities in the delay-structures between the different microphones without having to impose rigid assumptions on the room acoustics. The resulting estimator is formulated in terms of the solution to a convex optimization problem which can be implemented using standard solvers. In numerical examples, we demonstrate the potential of the proposed method in addressing otherwise ill-conditioned estimation scenarios.