Get our free extension to see links to code for papers anywhere online!Free extension: code links for papers anywhere!Free add-on: See code for papers anywhere!

Alexander Lin, Bahareh Tolooshams, Yves Atchadé, Demba Ba

Latent Gaussian models have a rich history in statistics and machine learning, with applications ranging from factor analysis to compressed sensing to time series analysis. The classical method for maximizing the likelihood of these models is the expectation-maximization (EM) algorithm. For problems with high-dimensional latent variables and large datasets, EM scales poorly because it needs to invert as many large covariance matrices as the number of data points. We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversion. Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation. In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.

Via

Bahareh Tolooshams, Satish Mulleti, Demba Ba, Yonina C. Eldar

The problem of sparse multichannel blind deconvolution (S-MBD) arises frequently in many engineering applications such as radar/sonar/ultrasound imaging. To reduce its computational and implementation cost, we propose a compression method that enables blind recovery from much fewer measurements with respect to the full received signal in time. The proposed compression measures the signal through a filter followed by a subsampling, allowing for a significant reduction in implementation cost. We derive theoretical guarantees for the identifiability and recovery of a sparse filter from compressed measurements. Our results allow for the design of a wide class of compression filters. We, then, propose a data-driven unrolled learning framework to learn the compression filter and solve the S-MBD problem. The encoder is a recurrent inference network that maps compressed measurements into an estimate of sparse filters. We demonstrate that our unrolled learning method is more robust to choices of source shapes and has better recovery performance compared to optimization-based methods. Finally, in applications with limited data (fewshot learning), we highlight the superior generalization capability of unrolled learning compared to conventional deep learning.

Via

Bahareh Tolooshams, Kazuhito Koishida

Deep learning-based speech enhancement has shown unprecedented performance in recent years. The most popular mono speech enhancement frameworks are end-to-end networks mapping the noisy mixture into an estimate of the clean speech. With growing computational power and availability of multichannel microphone recordings, prior works have aimed to incorporate spatial statistics along with spectral information to boost up performance. Despite an improvement in enhancement performance of mono output, the spatial image preservation and subjective evaluations have not gained much attention in the literature. This paper proposes a novel stereo-aware framework for speech enhancement, i.e., a training loss for deep learning-based speech enhancement to preserve the spatial image while enhancing the stereo mixture. The proposed framework is model independent, hence it can be applied to any deep learning based architecture. We provide an extensive objective and subjective evaluation of the trained models through a listening test. We show that by regularizing for an image preservation loss, the overall performance is improved, and the stereo aspect of the speech is better preserved.

Via

Bahareh Tolooshams, Demba Ba

The dictionary learning problem, representing data as a combination of few atoms, has long stood as a popular method for learning representations in statistics and signal processing. The most popular dictionary learning algorithm alternates between sparse coding and dictionary update steps, and a rich literature has studied its theoretical convergence. The growing popularity of neurally plausible unfolded sparse coding networks has led to the empirical finding that backpropagation through such networks performs dictionary learning. This paper offers the first theoretical proof for these empirical results through PUDLE, a Provable Unfolded Dictionary LEarning method. We highlight the impact of loss, unfolding, and backpropagation on convergence. We discover an implicit acceleration: as a function of unfolding, the backpropagated gradient converges faster and is more accurate than the gradient from alternating minimization. We complement our findings through synthetic and image denoising experiments. The findings support the use of accelerated deep learning optimizers and unfolded networks for dictionary learning.

Via

Andrew H. Song, Bahareh Tolooshams, Demba Ba

Convolutional dictionary learning (CDL), the problem of estimating shift-invariant templates from data, is typically conducted in the absence of a prior/structure on the templates. In data-scarce or low signal-to-noise ratio (SNR) regimes, which have received little attention from the community, learned templates overfit the data and lack smoothness, which can affect the predictive performance of downstream tasks. To address this limitation, we propose GPCDL, a convolutional dictionary learning framework that enforces priors on templates using Gaussian Processes (GPs). With the focus on smoothness, we show theoretically that imposing a GP prior is equivalent to Wiener filtering the learned templates, thereby suppressing high-frequency components and promoting smoothness. We show that the algorithm is a simple extension of the classical iteratively reweighted least squares, which allows the flexibility to experiment with different smoothness assumptions. Through simulation, we show that GPCDL learns smooth dictionaries with better accuracy than the unregularized alternative across a range of SNRs. Through an application to neural spiking data from rats, we show that learning templates by GPCDL results in a more accurate and visually-interpretable smooth dictionary, leading to superior predictive performance compared to non-regularized CDL, as well as parametric alternatives.

Via

Emmanouil Theodosis, Bahareh Tolooshams, Pranay Tankala, Abiy Tasissa, Demba Ba

Recent approaches in the theoretical analysis of model-based deep learning architectures have studied the convergence of gradient descent in shallow ReLU networks that arise from generative models whose hidden layers are sparse. Motivated by the success of architectures that impose structured forms of sparsity, we introduce and study a group-sparse autoencoder that accounts for a variety of generative models, and utilizes a group-sparse ReLU activation function to force the non-zero units at a given layer to occur in blocks. For clustering models, inputs that result in the same group of active units belong to the same cluster. We proceed to analyze the gradient dynamics of a shallow instance of the proposed autoencoder, trained with data adhering to a group-sparse generative model. In this setting, we theoretically prove the convergence of the network parameters to a neighborhood of the generating matrix. We validate our model through numerical analysis and highlight the superior performance of networks with a group-sparse ReLU compared to networks that utilize traditional ReLUs, both in sparse coding and in parameter recovery tasks. We also provide real data experiments to corroborate the simulated results, and emphasize the clustering capabilities of structured sparsity models.

Via

Bahareh Tolooshams, Satish Mulleti, Demba Ba, Yonina C. Eldar

We propose a learned-structured unfolding neural network for the problem of compressive sparse multichannel blind-deconvolution. In this problem, each channel's measurements are given as convolution of a common source signal and sparse filter. Unlike prior works where the compression is achieved either through random projections or by applying a fixed structured compression matrix, this paper proposes to learn the compression matrix from data. Given the full measurements, the proposed network is trained in an unsupervised fashion to learn the source and estimate sparse filters. Then, given the estimated source, we learn a structured compression operator while optimizing for signal reconstruction and sparse filter recovery. The efficient structure of the compression allows its practical hardware implementation. The proposed neural network is an autoencoder constructed based on an unfolding approach: upon training, the encoder maps the compressed measurements into an estimate of sparse filters using the compression operator and the source, and the linear convolutional decoder reconstructs the full measurements. We demonstrate that our method is superior to classical structured compressive sparse multichannel blind-deconvolution methods in terms of accuracy and speed of sparse filter recovery.

Via

Abiy Tasissa, Emmanouil Theodosis, Bahareh Tolooshams, Demba Ba

The sparse representation model has been successfully utilized in a number of signal and image processing tasks; however, recent research has highlighted its limitations in certain deep-learning architectures. This paper proposes a novel dense and sparse coding model that considers the problem of recovering a dense vector $\mathbf{x}$ and a sparse vector $\mathbf{u}$ given linear measurements of the form $\mathbf{y} = \mathbf{A}\mathbf{x}+\mathbf{B}\mathbf{u}$. Our first theoretical result proposes a new natural geometric condition based on the minimal angle between subspaces corresponding to the measurement matrices $\mathbf{A}$ and $\mathbf{B}$ to establish the uniqueness of solutions to the linear system. The second analysis shows that, under mild assumptions and sufficient linear measurements, a convex program recovers the dense and sparse components with high probability. The standard RIPless analysis cannot be directly applied to this setup. Our proof is a non-trivial adaptation of techniques from anisotropic compressive sensing theory and is based on an analysis of a matrix derived from the measurement matrices $\mathbf{A}$ and $\mathbf{B}$. We begin by demonstrating the effectiveness of the proposed model on simulated data. Then, to address its use in a dictionary learning setting, we propose a dense and sparse auto-encoder (DenSaE) that is tailored to it. We demonstrate that a) DenSaE denoises natural images better than architectures derived from the sparse coding model ($\mathbf{B}\mathbf{u}$), b) training the biases in the latter amounts to implicitly learning the $\mathbf{A}\mathbf{x} + \mathbf{B}\mathbf{u}$ model, and c) $\mathbf{A}$ and $\mathbf{B}$ capture low- and high-frequency contents, respectively.

Via

Bahareh Tolooshams, Ritwik Giri, Andrew H. Song, Umut Isik, Arvindh Krishnaswamy

Supervised deep learning has gained significant attention for speech enhancement recently. The state-of-the-art deep learning methods perform the task by learning a ratio/binary mask that is applied to the mixture in the time-frequency domain to produce the clean speech. Despite the great performance in the single-channel setting, these frameworks lag in performance in the multichannel setting as the majority of these methods a) fail to exploit the available spatial information fully, and b) still treat the deep architecture as a black box which may not be well-suited for multichannel audio processing. This paper addresses these drawbacks, a) by utilizing complex ratio masking instead of masking on the magnitude of the spectrogram, and more importantly, b) by introducing a channel-attention mechanism inside the deep architecture to mimic beamforming. We propose Channel-Attention Dense U-Net, in which we apply the channel-attention unit recursively on feature maps at every layer of the network, enabling the network to perform non-linear beamforming. We demonstrate the superior performance of the network against the state-of-the-art approaches on the CHiME-3 dataset.

Via

Thomas Chang, Bahareh Tolooshams, Demba Ba

Principal component analysis, dictionary learning, and auto-encoders are all unsupervised methods for learning representations from a large amount of training data. In all these methods, the higher the dimensions of the input data, the longer it takes to learn. We introduce a class of neural networks, termed RandNet, for learning representations using compressed random measurements of data of interest, such as images. RandNet extends the convolutional recurrent sparse auto-encoder architecture to dense networks and, more importantly, to the case when the input data are compressed random measurements of the original data. Compressing the input data makes it possible to fit a larger number of batches in memory during training. Moreover, in the case of sparse measurements,training is more efficient computationally. We demonstrate that, in unsupervised settings, RandNet performs dictionary learning using compressed data. In supervised settings, we show that RandNet can classify MNIST images with minimal loss in accuracy, despite being trained with random projections of the images that result in a 50% reduction in size. Overall, our results provide a general principled framework for training neural networks using compressed data.

Via