Recently, deep supervised hashing methods have become popular for large-scale image retrieval task. To preserve the semantic similarity notion between examples, they typically utilize the pairwise supervision or the triplet supervised information for hash learning. However, these methods usually ignore the semantic class information which can help the improvement of the semantic discriminative ability of hash codes. In this paper, we propose a novel hierarchy neighborhood discriminative hashing method. Specifically, we construct a bipartite graph to build coarse semantic neighbourhood relationship between the sub-class feature centers and the embeddings features. Moreover, we utilize the pairwise supervised information to construct the fined semantic neighbourhood relationship between embeddings features. Finally, we propose a hierarchy neighborhood discriminative hashing loss to unify the single-label and multilabel image retrieval problem with a one-stream deep neural network architecture. Experimental results on two largescale datasets demonstrate that the proposed method can outperform the state-of-the-art hashing methods.
This paper analyses how well a Fast Fully Convolutional Network (FastFCN) semantically segments satellite images and thus classifies Land Use/Land Cover(LULC) classes. Fast-FCN was used on Gaofen-2 Image Dataset (GID-2) to segment them in five different classes: BuiltUp, Meadow, Farmland, Water and Forest. The results showed better accuracy (0.93), precision (0.99), recall (0.98) and mean Intersection over Union (mIoU)(0.97) than other approaches like using FCN-8 or eCognition, a readily available software. We presented a comparison between the results. We propose FastFCN to be both faster and more accurate automated method than other existing methods for LULC classification.
Anomaly detection in high-dimensional data, such as images, is a challenging problem recently subject to intense research. Generative Adversarial Networks (GANs) have the ability to model the normal data distribution and, therefore, detect anomalies. Previously published GAN-based methods often assume that anomaly-free data is available for training. However, in real-life scenarios, this is not always the case. In this work, we examine the effects of contaminating training data with anomalies for state-of-the-art GAN-based anomaly detection methods. As expected, detection performance is reduced. To mitigate this problem, we propose to add an additional encoder network already at training time to adjust the structure of the latent space. As we show in our experiments, the distance in latent space from a query image to the origin is a highly significant cue to discriminate anomalies from normal data. The proposed method achieves state-of-the-art performance on CIFAR-10 as well as on a large new dataset with cell images.
Photography usually requires optics in conjunction with a recording device (an image sensor). Eliminating the optics could lead to new form factors for cameras. Here, we report a simple demonstration of imaging using a bare CMOS sensor that utilizes computation. The technique relies on the space variant point-spread functions resulting from the interaction of a point source in the field of view with the image sensor. These space-variant point-spread functions are combined with a reconstruction algorithm in order to image simple objects displayed on a discrete LED array as well as on an LCD screen. We extended the approach to video imaging at the native frame rate of the sensor. Finally, we performed experiments to analyze the parametric impact of the object distance. Improving the sensor designs and reconstruction algorithms can lead to useful cameras without optics.
In this paper, we propose the reproducing activation function to improve deep learning accuracy for various applications ranging from computer vision problems to scientific computing problems. The idea of reproducing activation functions is to employ several basic functions and their learnable linear combination to construct neuron-wise data-driven activation functions for each neuron. Armed with such activation functions, deep neural networks can reproduce traditional approximation tools and, therefore, approximate target functions with a smaller number of parameters than traditional neural networks. In terms of training dynamics of deep learning, reproducing activation functions can generate neural tangent kernels with a better condition number than traditional activation functions lessening the spectral bias of deep learning. As demonstrated by extensive numerical tests, the proposed activation function can facilitate the convergence of deep learning optimization for a solution with higher accuracy than existing deep learning solvers for audio/image/video reconstruction, PDEs, and eigenvalue problems.
Matching information across image and text modalities is a fundamental challenge for many applications that involve both vision and natural language processing. The objective is to find efficient similarity metrics to compare the similarity between visual and textual information. Existing approaches mainly match the local visual objects and the sentence words in a shared space with attention mechanisms. The matching performance is still limited because the similarity computation is based on simple comparisons of the matching features, ignoring the characteristics of their distribution in the data. In this paper, we address this limitation with an efficient learning objective that considers the discriminative feature distributions between the visual objects and sentence words. Specifically, we propose a novel Adversarial Discriminative Domain Regularization (ADDR) learning framework, beyond the paradigm metric learning objective, to construct a set of discriminative data domains within each image-text pairs. Our approach can generally improve the learning efficiency and the performance of existing metrics learning frameworks by regulating the distribution of the hidden space between the matching pairs. The experimental results show that this new approach significantly improves the overall performance of several popular cross-modal matching techniques (SCAN, VSRN, BFAN) on the MS-COCO and Flickr30K benchmarks.
Patterns, which are collections of elements arranged in regular or near-regular arrangements, are an important graphic art form and widely used due to their elegant simplicity and aesthetic appeal. When a pattern is encoded as a flat image without the underlying structure, manually editing the pattern is tedious and challenging as one has to both preserve the individual element shapes and their original relative arrangements. State-of-the-art deep learning frameworks that operate at the pixel level are unsuitable for manipulating such patterns. Specifically, these methods can easily disturb the shapes of the individual elements or their arrangement, and thus fail to preserve the latent structures of the input patterns. We present a novel differentiable compositing operator using pattern elements and use it to discover structures, in the form of a layered representation of graphical objects, directly from raw pattern images. This operator allows us to adapt current deep learning based image methods to effectively handle patterns. We evaluate our method on a range of patterns and demonstrate superiority in the context of pattern manipulations when compared against state-of-the-art
We propose a method for the approximation of high- or even infinite-dimensional feature vectors, which play an important role in supervised learning. The goal is to reduce the size of the training data, resulting in lower storage consumption and computational complexity. Furthermore, the method can be regarded as a regularization technique, which improves the generalizability of learned target functions. We demonstrate significant improvements in comparison to the computation of data-driven predictions involving the full training data set. The method is applied to classification and regression problems from different application areas such as image recognition, system identification, and oceanographic time series analysis.
Neural style transfer (NST), where an input image is rendered in the style of another image, has been a topic of considerable progress in recent years. Research over that time has been dominated by transferring aspects of color and texture, yet these factors are only one component of style. Other factors of style include composition, the projection system used, and the way in which artists warp and bend objects. Our contribution is to introduce a neural architecture that supports transfer of geometric style. Unlike recent work in this area, we are unique in being general in that we are not restricted by semantic content. This new architecture runs prior to a network that transfers texture style, enabling us to transfer texture to a warped image. This form of network supports a second novelty: we extend the NST input paradigm. Users can input content/style pair as is common, or they can chose to input a content/texture-style/geometry-style triple. This three image input paradigm divides style into two parts and so provides significantly greater versatility to the output we can produce. We provide user studies that show the quality of our output, and quantify the importance of geometric style transfer to style recognition by humans.
Hyperspectral imaging (HSI) has been extensively utilized for a number of real-world applications. HSI classification (HSIC) is a challenging task due to high inter-class similarity, high intra-class variability, overlapping, and nested regions. A 2D Convolutional Neural Network (CNN) is a viable approach whereby HSIC highly depends on both Spectral-Spatial information, therefore, 3D CNN can be an alternative but highly computational complex due to the volume and spectral dimensions. Furthermore, these models do not extract quality feature maps and may underperform over the regions having similar textures. Therefore, this work proposed a 3D CNN model that utilizes both spatial-spectral feature maps to attain good performance. In order to achieve the said performance, the HSI cube is first divided into small overlapping 3D patches. Later these patches are processed to generate 3D feature maps using a 3D kernel function over multiple contiguous bands that persevere the spectral information as well. Benchmark HSI datasets (Pavia University, Salinas and Indian Pines) are considered to validate the performance of our proposed method. The results are further compared with several state-of-the-art methods.