Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Image": models, code, and papers

Hyperspectral Image Denoising and Anomaly Detection Based on Low-rank and Sparse Representations

Mar 12, 2021
Lina Zhuang, Lianru Gao, Bing Zhang, Xiyou Fu, Jose M. Bioucas-Dias

Figure 1 for Hyperspectral Image Denoising and Anomaly Detection Based on Low-rank and Sparse Representations

Figure 2 for Hyperspectral Image Denoising and Anomaly Detection Based on Low-rank and Sparse Representations

Figure 3 for Hyperspectral Image Denoising and Anomaly Detection Based on Low-rank and Sparse Representations

Figure 4 for Hyperspectral Image Denoising and Anomaly Detection Based on Low-rank and Sparse Representations

Hyperspectral imaging measures the amount of electromagnetic energy across the instantaneous field of view at a very high resolution in hundreds or thousands of spectral channels. This enables objects to be detected and the identification of materials that have subtle differences between them. However, the increase in spectral resolution often means that there is a decrease in the number of photons received in each channel, which means that the noise linked to the image formation process is greater. This degradation limits the quality of the extracted information and its potential applications. Thus, denoising is a fundamental problem in hyperspectral image (HSI) processing. As images of natural scenes with highly correlated spectral channels, HSIs are characterized by a high level of self-similarity and can be well approximated by low-rank representations. These characteristics underlie the state-of-the-art methods used in HSI denoising. However, where there are rarely occurring pixel types, the denoising performance of these methods is not optimal, and the subsequent detection of these pixels may be compromised. To address these hurdles, in this article, we introduce RhyDe (Robust hyperspectral Denoising), a powerful HSI denoiser, which implements explicit low-rank representation, promotes self-similarity, and, by using a form of collaborative sparsity, preserves rare pixels. The denoising and detection effectiveness of the proposed robust HSI denoiser is illustrated using semireal and real data.

Via

Access Paper or Ask Questions

Highly-Efficient Binary Neural Networks for Visual Place Recognition

Feb 24, 2022
Bruno Ferrarini, Michael Milford, Klaus D. McDonald-Maier, Shoaib Ehsan

Figure 1 for Highly-Efficient Binary Neural Networks for Visual Place Recognition

Figure 2 for Highly-Efficient Binary Neural Networks for Visual Place Recognition

Figure 3 for Highly-Efficient Binary Neural Networks for Visual Place Recognition

Figure 4 for Highly-Efficient Binary Neural Networks for Visual Place Recognition

VPR is a fundamental task for autonomous navigation as it enables a robot to localize itself in the workspace when a known location is detected. Although accuracy is an essential requirement for a VPR technique, computational and energy efficiency are not less important for real-world applications. CNN-based techniques archive state-of-the-art VPR performance but are computationally intensive and energy demanding. Binary neural networks (BNN) have been recently proposed to address VPR efficiently. Although a typical BNN is an order of magnitude more efficient than a CNN, its processing time and energy usage can be further improved. In a typical BNN, the first convolution is not completely binarized for the sake of accuracy. Consequently, the first layer is the slowest network stage, requiring a large share of the entire computational effort. This paper presents a class of BNNs for VPR that combines depthwise separable factorization and binarization to replace the first convolutional layer to improve computational and energy efficiency. Our best model achieves state-of-the-art VPR performance while spending considerably less time and energy to process an image than a BNN using a non-binary convolution as a first stage.

* 8 pages, 10 figures, 2 tables

Via

Access Paper or Ask Questions

Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval

Apr 05, 2021
Ramon Sanabria, Austin Waters, Jason Baldridge

Figure 1 for Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval

Figure 2 for Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval

Figure 3 for Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval

Figure 4 for Talk, Don't Write: A Study of Direct Speech-Based Image Retrieval

Speech-based image retrieval has been studied as a proxy for joint representation learning, usually without emphasis on retrieval itself. As such, it is unclear how well speech-based retrieval can work in practice -- both in an absolute sense and versus alternative strategies that combine automatic speech recognition (ASR) with strong text encoders. In this work, we extensively study and expand choices of encoder architectures, training methodology (including unimodal and multimodal pretraining), and other factors. Our experiments cover different types of speech in three datasets: Flickr Audio, Places Audio, and Localized Narratives. Our best model configuration achieves large gains over state of the art, e.g., pushing recall-at-one from 21.8% to 33.2% for Flickr Audio and 27.6% to 53.4% for Places Audio. We also show our best speech-based models can match or exceed cascaded ASR-to-text encoding when speech is spontaneous, accented, or otherwise hard to automatically transcribe.

* Submitted to INTERSPEECH 2021

Via

Access Paper or Ask Questions

Learning Affinity-Aware Upsampling for Deep Image Matting

Nov 29, 2020
Yutong Dai, Hao Lu, Chunhua Shen

Figure 1 for Learning Affinity-Aware Upsampling for Deep Image Matting

Figure 2 for Learning Affinity-Aware Upsampling for Deep Image Matting

Figure 3 for Learning Affinity-Aware Upsampling for Deep Image Matting

Figure 4 for Learning Affinity-Aware Upsampling for Deep Image Matting

We show that learning affinity in upsampling provides an effective and efficient approach to exploit pairwise interactions in deep networks. Second-order features are commonly used in dense prediction to build adjacent relations with a learnable module after upsampling such as non-local blocks. Since upsampling is essential, learning affinity in upsampling can avoid additional propagation layers, offering the potential for building compact models. By looking at existing upsampling operators from a unified mathematical perspective, we generalize them into a second-order form and introduce Affinity-Aware Upsampling (A2U) where upsampling kernels are generated using a light-weight lowrank bilinear model and are conditioned on second-order features. Our upsampling operator can also be extended to downsampling. We discuss alternative implementations of A2U and verify their effectiveness on two detail-sensitive tasks: image reconstruction on a toy dataset; and a largescale image matting task where affinity-based ideas constitute mainstream matting approaches. In particular, results on the Composition-1k matting dataset show that A2U achieves a 14% relative improvement in the SAD metric against a strong baseline with negligible increase of parameters (<0.5%). Compared with the state-of-the-art matting network, we achieve 8% higher performance with only 40% model complexity.

Via

Access Paper or Ask Questions

Geometric deep learning reveals the spatiotemporal fingerprint of microscopic motion

Feb 13, 2022
Jesús Pineda, Benjamin Midtvedt, Harshith Bachimanchi, Sergio Noé, Daniel Midtvedt, Giovanni Volpe, Carlo Manzo

Figure 1 for Geometric deep learning reveals the spatiotemporal fingerprint of microscopic motion

Figure 2 for Geometric deep learning reveals the spatiotemporal fingerprint of microscopic motion

Figure 3 for Geometric deep learning reveals the spatiotemporal fingerprint of microscopic motion

Figure 4 for Geometric deep learning reveals the spatiotemporal fingerprint of microscopic motion

The characterization of dynamical processes in living systems provides important clues for their mechanistic interpretation and link to biological functions. Thanks to recent advances in microscopy techniques, it is now possible to routinely record the motion of cells, organelles, and individual molecules at multiple spatiotemporal scales in physiological conditions. However, the automated analysis of dynamics occurring in crowded and complex environments still lags behind the acquisition of microscopic image sequences. Here, we present a framework based on geometric deep learning that achieves the accurate estimation of dynamical properties in various biologically-relevant scenarios. This deep-learning approach relies on a graph neural network enhanced by attention-based components. By processing object features with geometric priors, the network is capable of performing multiple tasks, from linking coordinates into trajectories to inferring local and global dynamic properties. We demonstrate the flexibility and reliability of this approach by applying it to real and simulated data corresponding to a broad range of biological experiments.

* 17 pages, 5 figure, 2 supplementary figures

Via

Access Paper or Ask Questions

Solving Inverse Problems with Hybrid Deep Image Priors: the challenge of preventing overfitting

Nov 03, 2020
Zhaodong Sun

Figure 1 for Solving Inverse Problems with Hybrid Deep Image Priors: the challenge of preventing overfitting

Figure 2 for Solving Inverse Problems with Hybrid Deep Image Priors: the challenge of preventing overfitting

Figure 3 for Solving Inverse Problems with Hybrid Deep Image Priors: the challenge of preventing overfitting

Figure 4 for Solving Inverse Problems with Hybrid Deep Image Priors: the challenge of preventing overfitting

We mainly analyze and solve the overfitting problem of deep image prior (DIP). Deep image prior can solve inverse problems such as super-resolution, inpainting and denoising. The main advantage of DIP over other deep learning approaches is that it does not need access to a large dataset. However, due to the large number of parameters of the neural network and noisy data, DIP overfits to the noise in the image as the number of iterations grows. In the thesis, we use hybrid deep image priors to avoid overfitting. The hybrid priors are to combine DIP with an explicit prior such as total variation or with an implicit prior such as a denoising algorithm. We use the alternating direction method-of-multipliers (ADMM) to incorporate the new prior and try different forms of ADMM to avoid extra computation caused by the inner loop of ADMM steps. We also study the relation between the dynamics of gradient descent, and the overfitting phenomenon. The numerical results show the hybrid priors play an important role in preventing overfitting. Besides, we try to fit the image along some directions and find this method can reduce overfitting when the noise level is large. When the noise level is small, it does not considerably reduce the overfitting problem.

Via

Access Paper or Ask Questions

Piracy-Resistant DNN Watermarking by Block-Wise Image Transformation with Secret Key

Apr 09, 2021
MaungMaung AprilPyone, Hitoshi Kiya

Figure 1 for Piracy-Resistant DNN Watermarking by Block-Wise Image Transformation with Secret Key

Figure 2 for Piracy-Resistant DNN Watermarking by Block-Wise Image Transformation with Secret Key

Figure 3 for Piracy-Resistant DNN Watermarking by Block-Wise Image Transformation with Secret Key

Figure 4 for Piracy-Resistant DNN Watermarking by Block-Wise Image Transformation with Secret Key

In this paper, we propose a novel DNN watermarking method that utilizes a learnable image transformation method with a secret key. The proposed method embeds a watermark pattern in a model by using learnable transformed images and allows us to remotely verify the ownership of the model. As a result, it is piracy-resistant, so the original watermark cannot be overwritten by a pirated watermark, and adding a new watermark decreases the model accuracy unlike most of the existing DNN watermarking methods. In addition, it does not require a special pre-defined training set or trigger set. We empirically evaluated the proposed method on the CIFAR-10 dataset. The results show that it was resilient against fine-tuning and pruning attacks while maintaining a high watermark-detection accuracy.

Via

Access Paper or Ask Questions

Knowledge AI: New Medical AI Solution for Medical image Diagnosis

Jan 08, 2021
Yingni Wang, Shuge Lei, Jian Dai, Kehong Yuan

Figure 1 for Knowledge AI: New Medical AI Solution for Medical image Diagnosis

Figure 2 for Knowledge AI: New Medical AI Solution for Medical image Diagnosis

Figure 3 for Knowledge AI: New Medical AI Solution for Medical image Diagnosis

Figure 4 for Knowledge AI: New Medical AI Solution for Medical image Diagnosis

The implementation of medical AI has always been a problem. The effect of traditional perceptual AI algorithm in medical image processing needs to be improved. Here we propose a method of knowledge AI, which is a combination of perceptual AI and clinical knowledge and experience. Based on this method, the geometric information mining of medical images can represent the experience and information and evaluate the quality of medical images.

* 9 pages,8 figures. arXiv admin note: text overlap with arXiv:2101.02639

Via

Access Paper or Ask Questions

Satellite Image Classification with Deep Learning

Oct 13, 2020
Mark Pritt, Gary Chern

Figure 1 for Satellite Image Classification with Deep Learning

Figure 2 for Satellite Image Classification with Deep Learning

Figure 3 for Satellite Image Classification with Deep Learning

Figure 4 for Satellite Image Classification with Deep Learning

Satellite imagery is important for many applications including disaster response, law enforcement, and environmental monitoring. These applications require the manual identification of objects and facilities in the imagery. Because the geographic expanses to be covered are great and the analysts available to conduct the searches are few, automation is required. Yet traditional object detection and classification algorithms are too inaccurate and unreliable to solve the problem. Deep learning is a family of machine learning algorithms that have shown promise for the automation of such tasks. It has achieved success in image understanding by means of convolutional neural networks. In this paper we apply them to the problem of object and facility recognition in high-resolution, multi-spectral satellite imagery. We describe a deep learning system for classifying objects and facilities from the IARPA Functional Map of the World (fMoW) dataset into 63 different classes. The system consists of an ensemble of convolutional neural networks and additional neural networks that integrate satellite metadata with image features. It is implemented in Python using the Keras and TensorFlow deep learning libraries and runs on a Linux server with an NVIDIA Titan X graphics card. At the time of writing the system is in 2nd place in the fMoW TopCoder competition. Its total accuracy is 83%, the F1 score is 0.797, and it classifies 15 of the classes with accuracies of 95% or better.

* 2017 IEEE Applied Imagery Pattern Recognition Workshop (AIPR), Washington, DC, USA, 2017, pp. 1-7
* 7 pages, 18 figures, 2017 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)

Via

Access Paper or Ask Questions

Attention Enables Zero Approximation Error

Feb 24, 2022
Zhiying Fang, Yidong Ouyang, Ding-Xuan Zhou, Guang Cheng

Figure 1 for Attention Enables Zero Approximation Error

Figure 2 for Attention Enables Zero Approximation Error

Figure 3 for Attention Enables Zero Approximation Error

Figure 4 for Attention Enables Zero Approximation Error

Deep learning models have been widely applied in various aspects of daily life. Many variant models based on deep learning structures have achieved even better performances. Attention-based architectures have become almost ubiquitous in deep learning structures. Especially, the transformer model has now defeated the convolutional neural network in image classification tasks to become the most widely used tool. However, the theoretical properties of attention-based models are seldom considered. In this work, we show that with suitable adaptations, the single-head self-attention transformer with a fixed number of transformer encoder blocks and free parameters is able to generate any desired polynomial of the input with no error. The number of transformer encoder blocks is the same as the degree of the target polynomial. Even more exciting, we find that these transformer encoder blocks in this model do not need to be trained. As a direct consequence, we show that the single-head self-attention transformer with increasing numbers of free parameters is universal. These surprising theoretical results clearly explain the outstanding performances of the transformer model and may shed light on future modifications in real applications. We also provide some experiments to verify our theoretical result.

Via

Access Paper or Ask Questions