In recent years, deep learning has made great progress in the fields of image recognition, video analysis, natural language processing and speech recognition, including video super-resolution tasks. In this survey, we comprehensively investigate 28 state-of-the-art video super-resolution methods based on deep learning. It is well known that the leverage of information within video frames is important for video super-resolution. Hence we propose a taxonomy and classify the methods into six sub-categories according to the ways of utilizing inter-frame information. Moreover, the architectures and implementation details (including input and output, loss function and learning rate) of all the methods are depicted in details. Finally, we summarize and compare their performance on some benchmark datasets under different magnification factors. We also discuss some challenges, which need to be further addressed by researchers in the community of video super-resolution. Therefore, this work is expected to make a contribution to the future development of research in video super-resolution, and alleviate understandability and transferability of existing and future techniques into practice.
Large-scale non-convex sparsity-constrained problems have recently gained extensive attention. Most existing deterministic optimization methods (e.g., GraSP) are not suitable for large-scale and high-dimensional problems, and thus stochastic optimization methods with hard thresholding (e.g., SVRGHT) become more attractive. Inspired by GraSP, this paper proposes a new general relaxed gradient support pursuit (RGraSP) framework, in which the sub-algorithm only requires to satisfy a slack descent condition. We also design two specific semi-stochastic gradient hard thresholding algorithms. In particular, our algorithms have much less hard thresholding operations than SVRGHT, and their average per-iteration cost is much lower (i.e., O(d) vs. O(d log(d)) for SVRGHT), which leads to faster convergence. Our experimental results on both synthetic and real-world datasets show that our algorithms are superior to the state-of-the-art gradient hard thresholding methods.
In this paper, we propose a new first-order gradient-based algorithm to train deep neural networks. We first introduce the sign operation of stochastic gradients (as in sign-based methods, e.g., SIGN-SGD) into ADAM, which is called as signADAM. Moreover, in order to make the rate of fitting each feature closer, we define a confidence function to distinguish different components of gradients and apply it to our algorithm. It can generate more sparse gradients than existing algorithms do. We call this new algorithm signADAM++. In particular, both our algorithms are easy to implement and can speed up training of various deep neural networks. The motivation of signADAM++ is preferably learning features from the most different samples by updating large and useful gradients regardless of useless information in stochastic gradients. We also establish theoretical convergence guarantees for our algorithms. Empirical results on various datasets and models show that our algorithms yield much better performance than many state-of-the-art algorithms including SIGN-SGD, SIGNUM and ADAM. We also analyze the performance from multiple perspectives including the loss landscape and develop an adaptive method to further improve generalization. The source code is available at https://github.com/DongWanginxdu/signADAM-Learn-by-Confidence.
This paper proposes a novel cascaded U-Net for brain tumor segmentation. Inspired by the distinct hierarchical structure of brain tumor, we design a cascaded deep network framework, in which the whole tumor is segmented firstly and then the tumor internal substructures are further segmented. Considering that the increase of the network depth brought by cascade structures leads to a loss of accurate localization information in deeper layers, we construct many skip connections to link features at the same resolution and transmit detailed information from shallow layers to the deeper layers. Then we present a loss weighted sampling (LWS) scheme to eliminate the issue of imbalanced data during training the network. Experimental results on BraTS 2017 data show that our architecture framework outperforms the state-of-the-art segmentation algorithms, especially in terms of segmentation sensitivity.
In this paper, we propose a simple variant of the original SVRG, called variance reduced stochastic gradient descent (VR-SGD). Unlike the choices of snapshot and starting points in SVRG and its proximal variant, Prox-SVRG, the two vectors of VR-SGD are set to the average and last iterate of the previous epoch, respectively. The settings allow us to use much larger learning rates, and also make our convergence analysis more challenging. We also design two different update rules for smooth and non-smooth objective functions, respectively, which means that VR-SGD can tackle non-smooth and/or non-strongly convex problems directly without any reduction techniques. Moreover, we analyze the convergence properties of VR-SGD for strongly convex problems, which show that VR-SGD attains linear convergence. Different from its counterparts that have no convergence guarantees for non-strongly convex problems, we also provide the convergence guarantees of VR-SGD for this case, and empirically verify that VR-SGD with varying learning rates achieves similar performance to its momentum accelerated variant that has the optimal convergence rate $\mathcal{O}(1/T^2)$. Finally, we apply VR-SGD to solve various machine learning problems, such as convex and non-convex empirical risk minimization, and leading eigenvalue computation. Experimental results show that VR-SGD converges significantly faster than SVRG and Prox-SVRG, and usually outperforms state-of-the-art accelerated methods, e.g., Katyusha.
For polarimetric SAR (PolSAR) image classification, it is a challenge to classify the aggregated terrain types, such as the urban area, into semantic homogenous regions due to sharp bright-dark variations in intensity. The aggregated terrain type is formulated by the similar ground objects aggregated together. In this paper, a polarimetric hierarchical semantic model (PHSM) is firstly proposed to overcome this disadvantage based on the constructions of a primal-level and a middle-level semantic. The primal-level semantic is a polarimetric sketch map which consists of sketch segments as the sparse representation of a PolSAR image. The middle-level semantic is a region map which can extract semantic homogenous regions from the sketch map by exploiting the topological structure of sketch segments. Mapping the region map to the PolSAR image, a complex PolSAR scene is partitioned into aggregated, structural and homogenous pixel-level subspaces with the characteristics of relatively coherent terrain types in each subspace. Then, according to the characteristics of three subspaces above, three specific methods are adopted, and furthermore polarimetric information is exploited to improve the segmentation result. Experimental results on PolSAR data sets with different bands and sensors demonstrate that the proposed method is superior to the state-of-the-art methods in region homogeneity and edge preservation for terrain classification.