In low light or short-exposure photography the image is often corrupted by noise. While longer exposure helps reduce the noise, it can produce blurry results due to the object and camera motion. The reconstruction of a noise-less image is an ill posed problem. Recent approaches for image denoising aim to predict kernels which are convolved with a set of successively taken images (burst) to obtain a clear image. We propose a deep neural network based approach called Multi-Kernel Prediction Networks (MKPN) for burst image denoising. MKPN predicts kernels of not just one size but of varying sizes and performs fusion of these different kernels resulting in one kernel per pixel. The advantages of our method are two fold: (a) the different sized kernels help in extracting different information from the image which results in better reconstruction and (b) kernel fusion assures retaining of the extracted information while maintaining computational efficiency. Experimental results reveal that MKPN outperforms state-of-the-art on our synthetic datasets with different noise levels.
We propose a general framework for neural network compression that is motivated by the Minimum Description Length (MDL) principle. For that we first derive an expression for the entropy of a neural network, which measures its complexity explicitly in terms of its bit-size. Then, we formalize the problem of neural network compression as an entropy-constrained optimization objective. This objective generalizes many of the compression techniques proposed in the literature, in that pruning or reducing the cardinality of the weight elements of the network can be seen special cases of entropy-minimization techniques. Furthermore, we derive a continuous relaxation of the objective, which allows us to minimize it using gradient based optimization techniques. Finally, we show that we can reach state-of-the-art compression results on different network architectures and data sets, e.g. achieving x71 compression gains on a VGG-like architecture.
The analysis of neuroimaging data poses several strong challenges, in particular, due to its high dimensionality, its strong spatio-temporal correlation and the comparably small sample sizes of the respective datasets. To address these challenges, conventional decoding approaches such as the searchlight reduce the complexity of the decoding problem by considering local clusters of voxels only. Thereby, neglecting the distributed spatial patterns of brain activity underlying many cognitive states. In this work, we introduce the DLight framework, which overcomes these challenges by utilizing a long short-term memory unit (LSTM) based deep neural network architecture to analyze the spatial dependency structure of whole-brain fMRI data. In order to maintain interpretability of the neuroimaging data, we adapt the layer-wise relevance propagation (LRP) method. Thereby, we enable the neuroscientist user to study the learned association of the LSTM between the data and the cognitive state of the individual. We demonstrate the versatility of DLight by applying it to a large fMRI dataset of the Human Connectome Project. We show that the decoding performance of our method scales better with large datasets, and moreover outperforms conventional decoding approaches, while still detecting physiologically appropriate brain areas for the cognitive states classified. We also demonstrate that DLight is able to detect these areas on several levels of data granularity (i.e., group, subject, trial, time point).
Machine learning (ML) techniques such as (deep) artificial neural networks (DNN) are solving very successfully a plethora of tasks and provide new predictive models for complex physical, chemical, biological and social systems. However, in most cases this comes with the disadvantage of acting as a black box, rarely providing information about what made them arrive at a particular prediction. This black box aspect of ML techniques can be problematic especially in medical diagnoses, so far hampering a clinical acceptance. The present paper studies the uniqueness of individual gait patterns in clinical biomechanics using DNNs. By attributing portions of the model predictions back to the input variables (ground reaction forces and full-body joint angles), the Layer-Wise Relevance Propagation (LRP) technique reliably demonstrates which variables at what time windows of the gait cycle are most relevant for the characterisation of gait patterns from a certain individual. By measuring the timeresolved contribution of each input variable to the prediction of ML techniques such as DNNs, our method describes the first general framework that enables to understand and interpret non-linear ML methods in (biomechanical) gait analysis and thereby supplies a powerful tool for analysis, diagnosis and treatment of human gait.
In recent years, deep neural networks have revolutionized many application domains of machine learning and are key components of many critical decision or predictive processes. Therefore, it is crucial that domain specialists can understand and analyze actions and pre- dictions, even of the most complex neural network architectures. Despite these arguments neural networks are often treated as black boxes. In the attempt to alleviate this short- coming many analysis methods were proposed, yet the lack of reference implementations often makes a systematic comparison between the methods a major effort. The presented library iNNvestigate addresses this by providing a common interface and out-of-the- box implementation for many analysis methods, including the reference implementation for PatternNet and PatternAttribution as well as for LRP-methods. To demonstrate the versatility of iNNvestigate, we provide an analysis of image classifications for variety of state-of-the-art neural network architectures.
We investigate Early Hybrid Automatic Repeat reQuest (E-HARQ) feedback schemes enhanced by Machine Learning techniques as possible path towards ultra-reliable and low-latency communication (URLLC). To this end we propose Machine Learning methods to predict the outcome of the decoding process ahead of the end of the transmission. We discuss different input features and classification algorithms ranging from traditional methods to newly developed supervised autoencoders and their prospects of reaching effective block error rates of $10^{-5}$ that are required for URLLC with only small latency overhead. We provide realistic performance estimates in a system model incorporating scheduling effects to demonstrate the feasibility of E-HARQ across different signal-to-noise ratios, subcode lengths, channel conditions and system loads.
Interpretability of deep neural networks is a recently emerging area of machine learning research targeting a better understanding of how models perform feature selection and derive their classification decisions. In this paper, two neural network architectures are trained on spectrogram and raw waveform data for audio classification tasks on a newly created audio dataset and layer-wise relevance propagation (LRP), a previously proposed interpretability method, is applied to investigate the models' feature selection and decision making. It is demonstrated that the networks are highly reliant on feature marked as relevant by LRP through systematic manipulation of the input data. Our results show that by making deep audio classifiers interpretable, one can analyze and compare the properties and strategies of different models beyond classification accuracy, which potentially opens up new ways for model improvements.
Deep networks are able to learn highly predictive models of video data. Due to video length, a common strategy is to train them on small video snippets. We apply the deep Taylor / LRP technique to understand the deep network's classification decisions, and identify a "border effect": a tendency of the classifier to look mainly at the bordering frames of the input. This effect relates to the step size used to build the video snippet, which we can then tune in order to improve the classifier's accuracy without retraining the model. To our knowledge, this is the the first work to apply the deep Taylor / LRP technique on any video analyzing neural network.
Artificial neural networks tend to learn only what they need for a task. A manipulation of the training data can counter this phenomenon. In this paper, we study the effect of different alterations of the training data, which limit the amount and position of information that is available for the decision making. We analyze the accuracy and robustness against semantic and black box attacks on the networks that were trained on different training data modifications for the particular example of morphing attacks. A morphing attack is an attack on a biometric facial recognition system where the system is fooled to match two different individuals with the same synthetic face image. Such a synthetic image can be created by aligning and blending images of the two individuals that should be matched with this image.
Adversarial attacks on deep learning models have been demonstrated to be imperceptible to a human, while decreasing the model performance considerably. Attempts to provide invariance against such attacks have denoised adversarial samples to only send cleaned samples to the classifier. In a similar spirit this paper proposes a novel effective strategy that allows to relax adversarial samples onto the underlying manifold of the (unknown) target class distribution. Specifically, given an off-manifold adversarial example, our Metroplis-adjusted Langevin algorithm (Mala) guided through a supervised denoising autoencoder network (sDAE) allows to drive the adversarial samples towards high density regions of the data generating distribution. So, in a nutshell the adversarial example is transformed back from off-manifold onto the data manifold for which the learning model was originally trained and where it can perform well and robustly. Experiments on various benchmark datasets show that our novel Malade method exhibits a high robustness against blackbox and whitebox attacks and outperforms state-of-the-art defense algorithms.