This paper introduces a cross adversarial source separation (CASS) framework via autoencoder, a new model that aims at separating an input signal consisting of a mixture of multiple components into individual components defined via adversarial learning and autoencoder fitting. CASS unifies popular generative networks like auto-encoders (AEs) and generative adversarial networks (GANs) in a single framework. The basic building block that filters the input signal and reconstructs the $i$-th target component is a pair of deep neural networks $\mathcal{EN}_i$ and $\mathcal{DE}_i$ as an encoder for dimension reduction and a decoder for component reconstruction, respectively. The decoder $\mathcal{DE}_i$ as a generator is enhanced by a discriminator network $\mathcal{D}_i$ that favors signal structures of the $i$-th component in the $i$-th given dataset as guidance through adversarial learning. In contrast with existing practices in AEs which trains each Auto-Encoder independently, or in GANs that share the same generator, we introduce cross adversarial training that emphasizes adversarial relation between any arbitrary network pairs $(\mathcal{DE}_i,\mathcal{D}_j)$, achieving state-of-the-art performance especially when target components share similar data structures.
This paper introduces a novel generative encoder (GE) model for generative imaging and image processing with applications in compressed sensing and imaging, image compression, denoising, inpainting, deblurring, and super-resolution. The GE model consists of a pre-training phase and a solving phase. In the pre-training phase, we separately train two deep neural networks: a generative adversarial network (GAN) with a generator $\G$ that captures the data distribution of a given image set, and an auto-encoder (AE) network with an encoder $\EN$ that compresses images following the estimated distribution by GAN. In the solving phase, given a noisy image $x=\mathcal{P}(x^*)$, where $x^*$ is the target unknown image, $\mathcal{P}$ is an operator adding an addictive, or multiplicative, or convolutional noise, or equivalently given such an image $x$ in the compressed domain, i.e., given $m=\EN(x)$, we solve the optimization problem \[ z^*=\underset{z}{\mathrm{argmin}} \|\EN(\G(z))-m\|_2^2+\lambda\|z\|_2^2 \] to recover the image $x^*$ in a generative way via $\hat{x}:=\G(z^*)\approx x^*$, where $\lambda>0$ is a hyperparameter. The GE model unifies the generative capacity of GANs and the stability of AEs in an optimization framework above instead of stacking GANs and AEs into a single network or combining their loss functions into one as in existing literature. Numerical experiments show that the proposed model outperforms several state-of-the-art algorithms.
Supervised learning from training data with imbalanced class sizes, a commonly encountered scenario in real applications such as anomaly/fraud detection, has long been considered a significant challenge in machine learning. Motivated by recent progress in curriculum and self-paced learning, we propose to adopt a semi-supervised learning paradigm by training a deep neural network, referred to as SelectNet, to selectively add unlabelled data together with their predicted labels to the training dataset. Unlike existing techniques designed to tackle the difficulty in dealing with class imbalanced training data such as resampling, cost-sensitive learning, and margin-based learning, SelectNet provides an end-to-end approach for learning from important unlabelled data "in the wild" that most likely belong to the under-sampled classes in the training data, thus gradually mitigates the imbalance in the data used for training the classifier. We demonstrate the efficacy of SelectNet through extensive numerical experiments on standard datasets in computer vision.
We study the approximation efficiency of function compositions in nonlinear approximation, especially the case when compositions are implemented using multi-layer feed-forward neural networks (FNNs) with ReLU activation functions. The central question of interest is what are the advantages of function compositions in generating dictionaries and what is the optimal implementation of function compositions via ReLU FNNs, especially in modern computing architecture. This question is answered by studying the $N$-term approximation rate, which is the decrease in error versus the number of computational nodes (neurons) in the approximant, together with parallel efficiency for the first time. First, for an arbitrary function $f$ on $[0,1]$, regardless of its smoothness and even the continuity, if $f$ can be approximated via nonlinear approximation using one-hidden-layer ReLU FNNs with an approximation rate $O(N^{-\eta})$, we quantitatively show that dictionaries with function compositions via deep ReLU FNNs can improve the approximation rate to $O(N^{-2\eta})$. Second, for H{\"o}lder continuous functions of order $\alpha$ with a uniform Lipchitz constant $\omega$ on a $d$-dimensional cube, we show that the $N$-term approximation via ReLU FNNs with two or three function compositions can achieve an approximation rate $O( N^{-2\alpha/d})$. The approximation rate can be improved to $O(L^{-2\alpha/d})$ by composing $L$ times, if $N$ is fixed and sufficiently large; but further compositions cannot achieve the approximation rate $O(N^{-\alpha L/d})$. Finally, considering the computational efficiency per training iteration in parallel computing, FNNs with $O(1)$ hidden layers are an optimal choice for approximating H{\"o}lder continuous functions if computing resources are enough.
Overfitting frequently occurs in deep learning. In this paper, we propose a novel regularization method called Drop-Activation to reduce overfitting and improve generalization. The key idea is to \emph{drop} nonlinear activation functions by setting them to be identity functions randomly during training time. During testing, we use a deterministic network with a new activation function to encode the average effect of dropping activations randomly. Experimental results on CIFAR-10, CIFAR-100, SVHN, and EMNIST show that Drop-Activation generally improves the performance of popular neural network architectures. Furthermore, unlike dropout, as a regularizer Drop-Activation can be used in harmony with standard training and regularization techniques such as Batch Normalization and AutoAug. Our theoretical analyses support the regularization effect of Drop-Activation as implicit parameter reduction and its capability to be used together with Batch Normalization.
This paper proposes a novel non-oscillatory pattern (NOP) learning scheme for several oscillatory data analysis problems including signal decomposition, super-resolution, and signal sub-sampling. To the best of our knowledge, the proposed NOP is the first algorithm for these problems with fully non-stationary oscillatory data with close and crossover frequencies, and general oscillatory patterns. NOP is capable of handling complicated situations while existing algorithms fail; even in simple cases, e.g., stationary cases with trigonometric patterns, numerical examples show that NOP admits competitive or better performance in terms of accuracy and robustness than several state-of-the-art algorithms.
This paper proposes a recursive diffeomorphism based regression method for one-dimensional generalized mode decomposition problem that aims at extracting generalized modes $\alpha_k(t)s_k(2\pi N_k\phi_k(t))$ from their superposition $\sum_{k=1}^K \alpha_k(t)s_k(2\pi N_k\phi_k(t))$. First, a one-dimensional synchrosqueezed transform is applied to estimate instantaneous information, e.g., $\alpha_k(t)$ and $N_k\phi_k(t)$. Second, a novel approach based on diffeomorphisms and nonparametric regression is proposed to estimate wave shape functions $s_k(t)$. These two methods lead to a framework for the generalized mode decomposition problem under a weak well-separation condition. Numerical examples of synthetic and real data are provided to demonstrate the fruitful applications of these methods.