Abstract:In this article, we propose the optimization of the resolution of time-frequency atoms and the regularization of fitting models to obtain better representations of heart sound signals. This is done by evaluating the classification performance of deep learning (DL) networks in discriminating five heart valvular conditions based on a new class of time-frequency feature matrices derived from the fitting models. We inspect several combinations of resolution and regularization, and the optimal one is that provides the highest performance. To this end, a fitting model is obtained based on a heart sound signal and an overcomplete dictionary of Gabor atoms using elastic net regularization of linear models. We consider two different DL architectures, the first mainly consisting of a 1D convolutional neural network (CNN) layer and a long short-term memory (LSTM) layer, while the second is composed of 1D and 2D CNN layers followed by an LSTM layer. The networks are trained with two algorithms, namely stochastic gradient descent with momentum (SGDM) and adaptive moment (ADAM). Extensive experimentation has been conducted using a database containing heart sound signals of five heart valvular conditions. The best classification accuracy of $98.95\%$ is achieved with the second architecture when trained with ADAM and feature matrices derived from optimal models obtained with a Gabor dictionary consisting of atoms with high-time low-frequency resolution and imposing sparsity on the models.
Abstract:In Gaussian model-based multichannel audio source separation, the likelihood of observed mixtures of source signals is parametrized by source spectral variances and by associated spatial covariance matrices. These parameters are estimated by maximizing the likelihood through an Expectation-Maximization algorithm and used to separate the signals by means of multichannel Wiener filtering. We propose to estimate these parameters by applying nonnegative factorization based on prior information on source variances. In the nonnegative factorization, spectral basis matrices can be defined as the prior information. The matrices can be either extracted or indirectly made available through a redundant library that is trained in advance. In a separate step, applying nonnegative tensor factorization, two algorithms are proposed in order to either extract or detect the basis matrices that best represent the power spectra of the source signals in the observed mixtures. The factorization is achieved by minimizing the $β$-divergence through multiplicative update rules. The sparsity of factorization can be controlled by tuning the value of $β$. Experiments show that sparsity, rather than the value assigned to $β$ in the training, is crucial in order to increase the separation performance. The proposed method was evaluated in several mixing conditions. It provides better separation quality with respect to other comparable algorithms.
Abstract:Accurate vehicle counting through video surveillance is crucial for efficient traffic management. However, achieving high counting accuracy while ensuring computational efficiency remains a challenge. To address this, we propose a fully automated, video-based vehicle counting framework designed to optimize both computational efficiency and counting accuracy. Our framework operates in two distinct phases: \textit{estimation} and \textit{prediction}. In the estimation phase, the optimal region of interest (ROI) is automatically determined using a novel combination of three models based on detection scores, tracking scores, and vehicle density. This adaptive approach ensures compatibility with any detection and tracking method, enhancing the framework's versatility. In the prediction phase, vehicle counting is efficiently performed within the estimated ROI. We evaluated our framework on benchmark datasets like UA-DETRAC, GRAM, CDnet 2014, and ATON. Results demonstrate exceptional accuracy, with most videos achieving 100\% accuracy, while also enhancing computational efficiency, making processing up to four times faster than full-frame processing. The framework outperforms existing techniques, especially in complex multi-road scenarios, demonstrating robustness and superior accuracy. These advancements make it a promising solution for real-time traffic monitoring.