Abstract:To evaluate the performance of audio signal processing algorithms and to train data-driven algorithms, e.g., as applied in hearing instruments, either simulated or recorded data can be used. While large batches of simulated data can be generated using mathematical models, recorded data provide a more adequate representation of real-life scenarios. Therefore, in this paper, the Hearing Instrument Dataset in Various Acoustical Scenarios (HIDVAS) is introduced. This dataset consists of both impulse responses and audio recordings using eight external loudspeakers, two external microphones, and a dummy head. On this dummy head behind-the-ear (BTE) hearing instrument shells with two microphones per shell are mounted, and in the dummy head's ears receiver-in-canal (RIC) hearing instrument loudspeakers are inserted. The dummy head also contains microphones located at its eardrum. The impulse responses have been computed from a swept-sine recording for each microphone-loudspeaker pair, and the audio recordings have been obtained by playing back audio (male and female speech, speech shaped noise, singing voice, stringed instrument, wind instrument, and percussion instrument) through each individual loudspeaker and recording simultaneously using all microphones. These recordings have been repeated for four hearing instrument domes (open, semi-open, closed, and no-RIC) in three reverberation conditions in one room (T30 = 0.09 s, T30 = 0.47 s, and T30 = 0.73 s), and in one reverberation condition in a different room (T30 = 1.48 s). The usage of the dataset as a `hearing instrument in a box' is exemplified with three example use cases.
Abstract:Cell-free massive multi-input-multi-output (CFmMIMO) communication networks aim to provide uniform quality of service by distributing access points (APs) across a coverage area. In user-centric variants, each user equipment (UE) can choose a cluster of APs with the best channel conditions (e.g., the closest APs) for accessing service. This approach eliminates the notion of cells with dedicated regions and APs, as found in cellular mMIMO communication networks. Estimating uplink channels between UEs and APs is a crucial step in CFmMIMO communication networks; however, existing channel estimation (CE) approaches typically originate from mMIMO systems without considering the unique properties of CFmMIMO communication networks. For instance, shorter AP-UE distances in CFmMIMO systems result in Rician channel models with prominent line of sight (LoS) components between APs and UEs, motivating cooperation between APs for improved performance. In this paper, we propose a cooperative minimum-mean-squared-error (MMSE)-based uplink CE approach where APs share their linearly compressed signals as fused signals with other APs in the same cluster. The proposed approach is optimal, i.e., its performance is equivalent to that of the centralized CE approach, where APs share their uncompressed raw signals. Notably, this optimality is achieved in one shot; that is, given the required correlation matrices, the optimal fusion filters and estimators are derived non-iteratively. Consequently, the proposed approach guarantees lower communication overhead for cooperative CE compared to the centralized approach. Numerical experiments corroborate the superior performance of the proposed cooperative CE approaches in terms of CE accuracy and convergence rate.
Abstract:In public address systems and hearing aids, the maximally achievable amplification or gain is limited by acoustic feedback. Therefore, in order to be able to apply a higher gain, feedback cancellation methods are required. In addition, it is oftentimes also desirable to dereverberate a recorded signal, that is, remove the late reverberation component of the signal, before playing it back. In this paper, it is shown that under two mild conditions, the acoustic feedback signal can be written as a reverberant version of the source signal. Therefore, it is possible to treat the joint dereverberation and acoustic feedback cancellation problem as a dereverberation-only problem, meaning that dereverberation algorithms can be applied to the joint problem. Simulations corroborate this finding
Abstract:In a wireless acoustic sensor network (WASN), devices (i.e., nodes) can collaborate through distributed algorithms to collectively perform audio signal processing tasks. This paper focuses on the distributed estimation of node-specific desired speech signals using network-wide Wiener filtering. The objective is to match the performance of a centralized system that would have access to all microphone signals, while reducing the communication bandwidth usage of the algorithm. Existing solutions, such as the distributed adaptive node-specific signal estimation (DANSE) algorithm, converge towards the multichannel Wiener filter (MWF) which solves a centralized linear minimum mean square error (LMMSE) signal estimation problem. However, they do so iteratively, which can be slow and impractical. Many solutions also assume that all nodes observe the same set of sources of interest, which is often not the case in practice. To overcome these limitations, we propose the distributed multichannel Wiener filter (dMWF) for fully connected WASNs. The dMWF is non-iterative and optimal even when nodes observe different sets of sources. In this algorithm, nodes exchange neighbor-pair-specific, low-dimensional (fused) signals estimating the contribution of sources observed by both nodes in the pair. We formally prove the optimality of dMWF and demonstrate its performance in simulated speech enhancement experiments. The proposed algorithm is shown to outperform DANSE in terms of objective metrics after short operation times, highlighting the benefit of its iterationless design.
Abstract:Cell-free massive-multiple-input-multiple-output (CFmMIMO) is a key enabler for sixth-generation (6G) wireless communication networks, where distributed access points (APs) jointly serve user equipments (UEs). In commonly adopted channel models for CFmMIMO networks, inter-AP channel correlation is assumed to be absent, thereby eliminating the potential benefits of centralized processing. However, by carefully designing the pilot transmission phase, the AP received signals during pilot transmission can become correlated, and thus, centralization can improve channel estimation performance, despite the absence of inter-AP channel correlation. In this paper, we propose a channel estimation scheme, termed master-assisted channel estimation (MACE), that aims to leverage inter-AP signal correlation by means of partially centralized processing and hence improve channel estimation performance. In MACE, a subset of APs fuse and forward their received pilot signals to a master AP, which then performs channel estimation using the fused signals together with its locally received signals. This scheme strikes a balance between local and fully centralized processing by leveraging inter-AP signal correlation, while reducing fronthaul signaling and computational complexity. Numerical experiments demonstrate that MACE consistently outperforms local channel estimation, where inter-AP signal correlation is neglected.
Abstract:In a cell-free massive MIMO (CFmMIMO) network with a daisy-chain fronthaul, the amount of information that each access point (AP) needs to communicate with the next AP in the chain is determined by the location of the AP in the sequential fronthaul. Therefore, we propose two sequential processing strategies to combat the adverse effect of fronthaul compression on the sum of users' spectral efficiency (SE): 1) linearly increasing fronthaul capacity allocation among APs and 2) Two-Path users' signal estimation. The two strategies show superior performance in terms of sum SE compared to the equal fronthaul capacity allocation and Single-Path sequential signal estimation.
Abstract:Cell-free massive multiple-input-multiple-output is considered a promising technology for the next generation of wireless communication networks. The main idea is to distribute a large number of access points (APs) in a geographical region to serve the user equipments (UEs) cooperatively. In the uplink, one of two types of operations is often adopted: centralized or distributed. In centralized operation, channel estimation and data decoding are performed at the central processing unit (CPU), whereas in distributed operation, channel estimation occurs at the APs and data detection at the CPU. In this paper, we propose a novel uplink operation, termed Master-Assisted Distributed Uplink Operation (MADUO), where each UE is assigned a master AP, which receives soft data estimates from the other APs and decodes the data using its local signals and the received data estimates. Numerical experiments demonstrate that the proposed operation performs comparably to the centralized operation and balances fronthaul signaling and computational complexity.

Abstract:Two algorithms for combined acoustic echo cancellation (AEC) and noise reduction (NR) are analysed, namely the generalised echo and interference canceller (GEIC) and the extended multichannel Wiener filter (MWFext). Previously, these algorithms have been examined for linear echo paths, and assuming access to voice activity detectors (VADs) that separately detect desired speech and echo activity. However, algorithms implementing VADs may introduce detection errors. Therefore, in this paper, the previous analyses are extended by 1) modelling general nonlinear echo paths by means of the generalised Bussgang decomposition, and 2) modelling VAD error effects in each specific algorithm, thereby also allowing to model specific VAD assumptions. It is found and verified with simulations that, generally, the MWFext achieves a higher NR performance, while the GEIC achieves a more robust AEC performance.




Abstract:In many speech recording applications, noise and acoustic echo corrupt the desired speech. Consequently, combined noise reduction (NR) and acoustic echo cancellation (AEC) is required. Generally, a cascade approach is followed, i.e., the AEC and NR are designed in isolation by selecting a separate signal model, formulating a separate cost function, and using a separate solution strategy. The AEC and NR are then cascaded one after the other, not accounting for their interaction. In this paper, however, an integrated approach is proposed to consider this interaction in a general multi-microphone/multi-loudspeaker setup. Therefore, a single signal model of either the microphone signal vector or the extended signal vector, obtained by stacking microphone and loudspeaker signals, is selected, a single mean squared error cost function is formulated, and a common solution strategy is used. Using this microphone signal model, a multi channel Wiener filter (MWF) is derived. Using the extended signal model, an extended MWF (MWFext) is derived, and several equivalent expressions are found, which nevertheless are interpretable as cascade algorithms. Specifically, the MWFext is shown to be equivalent to algorithms where the AEC precedes the NR (AEC NR), the NR precedes the AEC (NR-AEC), and the extended NR (NRext) precedes the AEC and post-filter (PF) (NRext-AECPF). Under rank-deficiency conditions the MWFext is non-unique, such that this equivalence amounts to the expressions being specific, not necessarily minimum-norm solutions for this MWFext. The practical performances nonetheless differ due to non-stationarities and imperfect correlation matrix estimation, resulting in the AEC-NR and NRext-AEC-PF attaining best overall performance.



Abstract:A one-shot algorithm called iterationless DANSE (iDANSE) is introduced to perform distributed adaptive node-specific signal estimation (DANSE) in a fully connected wireless acoustic sensor network (WASN) deployed in an environment with non-overlapping latent signal subspaces. The iDANSE algorithm matches the performance of a centralized algorithm in a single processing cycle while devices exchange fused versions of their multichannel local microphone signals. Key advantages of iDANSE over currently available solutions are its iterationless nature, which favors deployment in real-time applications, and the fact that devices can exchange fewer fused signals than the number of latent sources in the environment. The proposed method is validated in numerical simulations including a speech enhancement scenario.