The conclusions provided by deep neural networks (DNNs) must be carefully scrutinized to determine whether they are universal or architecture dependent. The term DAG-DNN refers to a graphical representation of a DNN in which the architecture is expressed as a direct-acyclic graph (DAG), on which arcs are associated with functions. The level of a node denotes the maximum number of hops between the input node and the node of interest. In the current study, we demonstrate that DAG-DNNs can be used to derive all functions defined on various sub-architectures of the DNN. We also demonstrate that the functions defined in a DAG-DNN can be derived via a sequence of lower-triangular matrices, each of which provides the transition of functions defined in sub-graphs up to nodes at a specified level. The lifting structure associated with lower-triangular matrices makes it possible to perform the structural pruning of a network in a systematic manner. The fact that decomposition is universally applicable to all DNNs means that network pruning could theoretically be applied to any DNN, regardless of the underlying architecture. We demonstrate that it is possible to obtain the winning ticket (sub-network and initialization) for a weak version of the lottery ticket hypothesis, based on the fact that the sub-network with initialization can achieve training performance on par with that of the original network using the same number of iterations or fewer.
A general lack of understanding pertaining to deep feedforward neural networks (DNNs) can be attributed partly to a lack of tools with which to analyze the composition of non-linear functions, and partly to a lack of mathematical models applicable to the diversity of DNN architectures. In this paper, we made a number of basic assumptions pertaining to activation functions, non-linear transformations, and DNN architectures in order to use the un-rectifying method to analyze DNNs via directed acyclic graphs (DAGs). DNNs that satisfy these assumptions are referred to as general DNNs. Our construction of an analytic graph was based on an axiomatic method in which DAGs are built from the bottom-up through the application of atomic operations to basic elements in accordance with regulatory rules. This approach allows us to derive the properties of general DNNs via mathematical induction. We show that using the proposed approach, some properties hold true for general DNNs can be derived. This analysis advances our understanding of network functions and could promote further theoretical insights if the host of analytical tools for graphs can be leveraged.
The un-rectifying technique expresses a non-linear point-wise activation function as a data-dependent variable, which means that the activation variable along with its input and output can all be employed in optimization. The ReLU network in this study was un-rectified means that the activation functions could be replaced with data-dependent activation variables in the form of equations and constraints. The discrete nature of activation variables associated with un-rectifying ReLUs allows the reformulation of deep learning problems as problems of combinatorial optimization. However, we demonstrate that the optimal solution to a combinatorial optimization problem can be preserved by relaxing the discrete domains of activation variables to closed intervals. This makes it easier to learn a network using methods developed for real-domain constrained optimization. We also demonstrate that by introducing data-dependent slack variables as constraints, it is possible to optimize a network based on the augmented Lagrangian approach. This means that our method could theoretically achieve global convergence and all limit points are critical points of the learning problem. In experiments, our novel approach to solving the compressed sensing recovery problem achieved state-of-the-art performance when applied to the MNIST database and natural images.
We consider deep feedforward neural networks with rectified linear units from a signal processing perspective. In this view, such representations mark the transition from using a single (data-driven) linear representation to utilizing a large collection of affine linear representations tailored to particular regions of the signal space. This paper provides a precise description of the individual affine linear representations and corresponding domain regions that the (data-driven) neural network associates to each signal of the input space. In particular, we describe atomic decompositions of the representations and, based on estimating their Lipschitz regularity, suggest some conditions that can stabilize learning independent of the network depth. Such an analysis may promote further theoretical insight from both the signal processing and machine learning communities.
Frames are the foundation of the linear operators used in the decomposition and reconstruction of signals, such as the discrete Fourier transform, Gabor, wavelets, and curvelet transforms. The emergence of sparse representation models has shifted of the emphasis in frame theory toward sparse l1-minimization problems. In this paper, we apply frame theory to the sparse representation of signals in which a synthesis dictionary is used for a frame and an analysis dictionary is used for a dual frame. We sought to formulate a novel dual frame design in which the sparse vector obtained through the decomposition of any signal is also the sparse solution representing signals based on a reconstruction frame. Our findings demonstrate that this type of dual frame cannot be constructed for over-complete frames, thereby precluding the use of any linear analysis operator in driving the sparse synthesis coefficient for signal representation. Nonetheless, the best approximation to the sparse synthesis solution can be derived from the analysis coefficient using the canonical dual frame. In this study, we developed a novel dictionary learning algorithm (called Parseval K-SVD) to learn a tight-frame dictionary. We then leveraged the analysis and synthesis perspectives of signal representation with frames to derive optimization formulations for problems pertaining to image recovery. Our preliminary, results demonstrate that the images recovered using this approach are correlated to the frame bounds of dictionaries, thereby demonstrating the importance of using different dictionaries for different applications.
Blind image restoration is a non-convex problem which involves restoration of images from an unknown blur kernel. The factors affecting the performance of this restoration are how much prior information about an image and a blur kernel are provided and what algorithm is used to perform the restoration task. Prior information on images is often employed to restore the sharpness of the edges of an image. By contrast, no consensus is still present regarding what prior information to use in restoring from a blur kernel due to complex image blurring processes. In this paper, we propose modelling of a blur kernel as a sparse linear combinations of basic 2-D patterns. Our approach has a competitive edge over the existing blur kernel modelling methods because our method has the flexibility to customize the dictionary design, which makes it well-adaptive to a variety of applications. As a demonstration, we construct a dictionary formed by basic patterns derived from the Kronecker product of Gaussian sequences. We also compare our results with those derived by other state-of-the-art methods, in terms of peak signal to noise ratio (PSNR).