Radiological image is currently adopted as the visual evidence for COVID-19 diagnosis in clinical. Using deep models to realize automated infection measurement and COVID-19 diagnosis is important for faster examination based on radiological imaging. Unfortunately, collecting large training data systematically in the early stage is difficult. To address this problem, we explore the feasibility of learning deep models for COVID-19 diagnosis from a single radiological image by resorting to synthesizing diverse radiological images. Specifically, we propose a novel conditional generative model, called CoSinGAN, which can be learned from a single radiological image with a given condition, i.e., the annotations of the lung and COVID-19 infection. Our CoSinGAN is able to capture the conditional distribution of visual finds of COVID-19 infection, and further synthesize diverse and high-resolution radiological images that match the input conditions precisely. Both deep classification and segmentation networks trained on synthesized samples from CoSinGAN achieve notable detection accuracy of COVID-19 infection. Such results are significantly better than the counterparts trained on the same extremely small number of real samples (1 or 2 real samples) by using strong data augmentation, and approximate to the counterparts trained on large dataset (2846 real images). It confirms our method can significantly reduce the performance gap between deep models trained on extremely small dataset and on large dataset, and thus has the potential to realize learning COVID-19 diagnosis from few radiological images in the early stage of COVID-19 pandemic. Our codes are made publicly available at https://github.com/PengyiZhang/CoSinGAN.
We target at providing a computational cheap yet effective approach for fine-grained image classification (FGIC) in this paper. Compared to previous methods that armed with a sophisticated part localization module for fine-grained feature learning, our approach attains this function by improving the semantics of sub-features of a global feature. To this end, we first achieve the sub-feature semantic by rearranging feature channels of a CNN into different groups through channel permutation, which is implicitly realized without the need of modifying backbone network structures. A weighted combination regularization derived from matching prediction distributions between the global feature and its sub-features is then employed to guide the learned groups to be activated on local parts with strong discriminability, thus increasing the discriminability of the global feature in fine-grained scales. Our approach brings negligible extra parameters to the backbone CNNs and can be implemented as a plug-and-play module as well as trained end-to-end with only image-level supervision. Experiments on four fine-grained benchmark datasets verified the effectiveness of our approach and validated its comparable performance to the state-of-the-art methods. Code is available at {\it \url{https://github.com/cswluo/SEF}}
Foot ulcer is a common complication of diabetes mellitus; it is associated with substantial morbidity and mortality and remains a major risk factor for lower leg amputation. Extracting accurate morphological features from the foot wounds is crucial for proper treatment. Although visual and manual inspection by medical professionals is the common approach to extract the features, this method is subjective and error-prone. Computer-mediated approaches are the alternative solutions to segment the lesions and extract related morphological features. Among various proposed computer-based approaches for image segmentation, deep learning-based methods and more specifically convolutional neural networks (CNN) have shown excellent performances for various image segmentation tasks including medical image segmentation. In this work, we proposed an ensemble approach based on two encoder-decoder-based CNN models, namely LinkNet and UNet, to perform foot ulcer segmentation. To deal with limited training samples, we used pre-trained weights (EfficientNetB1 for the LinkNet model and EfficientNetB2 for the UNet model) and further pre-training by the Medetec dataset. We also applied a number of morphological-based and colour-based augmentation techniques to train the models. We integrated five-fold cross-validation, test time augmentation and result fusion in our proposed ensemble approach to boost the segmentation performance. Applied on a publicly available foot ulcer segmentation dataset and the MICCAI 2021 Foot Ulcer Segmentation (FUSeg) Challenge, our method achieved state-of-the-art data-based Dice scores of 92.07% and 88.80%, respectively. Our developed method achieved the first rank in the FUSeg challenge leaderboard. The Dockerised guideline, inference codes and saved trained models are publicly available in the published GitHub repository: https://github.com/masih4/Foot_Ulcer_Segmentation
Identification, classification, and quantification of crop defects are of paramount of interest to the farmers for preventive measures and decrease the yield loss through necessary remedial actions. Due to the vast agricultural field, manual inspection of crops is tedious and time-consuming. UAV based data collection, observation, identification, and quantification of defected leaves area are considered to be an effective solution. The present work attempts to estimate the percentage of affected groundnut leaves area across four regions of Andharapradesh using image processing techniques. The proposed method involves colour space transformation combined with thresholding technique to perform the segmentation. The calibration measures are performed during acquisition with respect to UAV capturing distance, angle and other relevant camera parameters. Finally, our method can estimate the consolidated leaves and defected area. The image analysis results across these four regions reveal that around 14 - 28% of leaves area is affected across the groundnut field and thereby yield will be diminished correspondingly. Hence, it is recommended to spray the pesticides on the affected regions alone across the field to improve the plant growth and thereby yield will be increased.
Predicting all applicable labels for a given image is known as multi-label classification. Compared to the standard multi-class case (where each image has only one label), it is considerably more challenging to annotate training data for multi-label classification. When the number of potential labels is large, human annotators find it difficult to mention all applicable labels for each training image. Furthermore, in some settings detection is intrinsically difficult e.g. finding small object instances in high resolution images. As a result, multi-label training data is often plagued by false negatives. We consider the hardest version of this problem, where annotators provide only one relevant label for each image. As a result, training sets will have only one positive label per image and no confirmed negatives. We explore this special case of learning from missing labels across four different multi-label image classification datasets for both linear classifiers and end-to-end fine-tuned deep networks. We extend existing multi-label losses to this setting and propose novel variants that constrain the number of expected positive labels during training. Surprisingly, we show that in some cases it is possible to approach the performance of fully labeled classifiers despite training with significantly fewer confirmed labels.
In this paper, we present a versatile method for visual localization. It is based on robust image retrieval for coarse camera pose estimation and robust local features for accurate pose refinement. Our method is top ranked on various public datasets showing its ability of generalization and its great variety of applications. To facilitate experiments, we introduce kapture, a flexible data format and processing pipeline for structure from motion and visual localization that is released open source. We furthermore provide all datasets used in this paper in the kapture format to facilitate research and data processing. The code can be found at https://github.com/naver/kapture, the datasets as well as more information, updates, and news can be found at https://europe.naverlabs.com/research/3d-vision/kapture.
For a given image generation problem, the intrinsic image manifold is often low dimensional. We use the intuition that it is much better to train the GAN generator by minimizing the distributional distance between real and generated images in a small dimensional feature space representing such a manifold than on the original pixel-space. We use the feature space of the GAN discriminator for such a representation. For distributional distance, we employ one of two choices: the Fr\'{e}chet distance or direct optimal transport (OT); these respectively lead us to two new GAN methods: Fr\'{e}chet-GAN and OT-GAN. The idea of employing Fr\'{e}chet distance comes from the success of Fr\'{e}chet Inception Distance as a solid evaluation metric in image generation. Fr\'{e}chet-GAN is attractive in several ways. We propose an efficient, numerically stable approach to calculate the Fr\'{e}chet distance and its gradient. The Fr\'{e}chet distance estimation requires a significantly less computation time than OT; this allows Fr\'{e}chet-GAN to use much larger mini-batch size in training than OT. More importantly, we conduct experiments on a number of benchmark datasets and show that Fr\'{e}chet-GAN (in particular) and OT-GAN have significantly better image generation capabilities than the existing representative primal and dual GAN approaches based on the Wasserstein distance.
Robots can effectively grasp and manipulate objects using their 3D models. In this paper, we propose a simple shape representation and a reconstruction method that outperforms state-of-the-art methods in terms of geometric metrics and enables grasp generation with high precision and success. Our reconstruction method models the object geometry as a pair of depth images, composing the "shell" of the object. This representation allows using image-to-image residual ConvNet architectures for 3D reconstruction, generates object reconstruction directly in the camera frame, and generalizes well to novel object types. Moreover, an object shell can be converted into an object mesh in a fraction of a second, providing time and memory efficient alternative to voxel or implicit representations. We explore the application of shell representation for grasp planning. With rigorous experimental validation, both in simulation and on a real setup, we show that shell reconstruction encapsulates sufficient geometric information to generate precise grasps and the associated grasp quality with over 90% accuracy. Diverse grasps computed on shell reconstructions allow the robot to select and execute grasps in cluttered scenes with more than 93% success rate.
This paper is focused on deriving an optimal image smoother. The optimization is done through the minimization of the norm of the Laplace operator in the image coordinate system. Discretizing the Laplace operator and using the method of Euler-Lagrange result in a weighted average scheme for the optimal smoother. Satellite imagery can be smoothed by this optimal smoother. It is also very fast and can be used for detecting the anomalies in the image. A real anomaly detecting problem is considered for the Qom region in Iran. Satellite image in different bands are smoothed. Comparing the smoothed and original images in different bands, the maps of anomalies are presented. Comparison between the derived method and the existing methods reveals that it is more efficient in detecting anomalies in the region.
Deep image inpainting aims to restore damaged or missing regions in an image with realistic contents. While having a wide range of applications such as object removal and image recovery, deep inpainting techniques also have the risk of being manipulated for image forgery. A promising countermeasure against such forgeries is deep inpainting detection, which aims to locate the inpainted regions in an image. In this paper, we make the first attempt towards universal detection of deep inpainting, where the detection network can generalize well when detecting different deep inpainting methods. To this end, we first propose a novel data generation approach to generate a universal training dataset, which imitates the noise discrepancies exist in real versus inpainted image contents to train universal detectors. We then design a Noise-Image Cross-fusion Network (NIX-Net) to effectively exploit the discriminative information contained in both the images and their noise patterns. We empirically show, on multiple benchmark datasets, that our approach outperforms existing detection methods by a large margin and generalize well to unseen deep inpainting techniques. Our universal training dataset can also significantly boost the generalizability of existing detection methods.