Harmonic inpainting with optimised data is very popular for inpainting-based image compression. We improve this approach in three important aspects. Firstly, we replace the standard finite differences discretisation by a finite element method with triangle elements. This does not only speed up inpainting and data selection, but even improves the reconstruction quality. Secondly, we propose highly efficient algorithms for spatial and tonal data optimisation that are several orders of magnitude faster than state-of-the-art methods. Last but not least, we show that our algorithms also allow working with very large images. This has previously been impractical due to the memory and runtime requirements of prior algorithms.
There has been significant research done on developing methods for improving robustness to distributional shift and uncertainty estimation. In contrast, only limited work has examined developing standard datasets and benchmarks for assessing these approaches. Additionally, most work on uncertainty estimation and robustness has developed new techniques based on small-scale regression or image classification tasks. However, many tasks of practical interest have different modalities, such as tabular data, audio, text, or sensor data, which offer significant challenges involving regression and discrete or continuous structured prediction. Thus, given the current state of the field, a standardized large-scale dataset of tasks across a range of modalities affected by distributional shifts is necessary. This will enable researchers to meaningfully evaluate the plethora of recently developed uncertainty quantification methods, as well as assessment criteria and state-of-the-art baselines. In this work, we propose the \emph{Shifts Dataset} for evaluation of uncertainty estimates and robustness to distributional shift. The dataset, which has been collected from industrial sources and services, is composed of three tasks, with each corresponding to a particular data modality: tabular weather prediction, machine translation, and self-driving car (SDC) vehicle motion prediction. All of these data modalities and tasks are affected by real, `in-the-wild' distributional shifts and pose interesting challenges with respect to uncertainty estimation. In this work we provide a description of the dataset and baseline results for all tasks.
Moving target detection plays an important role in computer vision. However, traditional algorithms such as frame difference and optical flow usually suffer from low accuracy or heavy computation. Recent algorithms such as deep learning-based convolutional neural networks have achieved high accuracy and real-time performance, but they usually need to know the classes of targets in advance, which limits the practical applications. Therefore, we proposed a model free moving target detection algorithm. This algorithm extracts the moving area through the difference of image features. Then, the color and location probability map of the moving area will be calculated through maximum a posteriori probability. And the target probability map can be obtained through the dot multiply between the two maps. Finally, the optimal moving target area can be solved by stochastic gradient descent on the target probability map. Results show that the proposed algorithm achieves the highest accuracy compared with state-of-the-art algorithms, without needing to know the classes of targets. Furthermore, as the existing datasets are not suitable for moving target detection, we proposed a method for producing evaluation dataset. Besides, we also proved the proposed algorithm can be used to assist target tracking.
Full-color imaging is significant in digital pathology. Compared with a grayscale image or a pseudo-color image that only contains the contrast information, it can identify and detect the target object better with color texture information. Fourier ptychographic microscopy (FPM) is a high-throughput computational imaging technique that breaks the tradeoff between high resolution (HR) and large field-of-view (FOV), which eliminates the artifacts of scanning and stitching in digital pathology and improves its imaging efficiency. However, the conventional full-color digital pathology based on FPM is still time-consuming due to the repeated experiments with tri-wavelengths. A color transfer FPM approach, termed CFPM was reported. The color texture information of a low resolution (LR) full-color pathologic image is directly transferred to the HR grayscale FPM image captured by only a single wavelength. The color space of FPM based on the standard CIE-XYZ color model and display based on the standard RGB (sRGB) color space were established. Different FPM colorization schemes were analyzed and compared with thirty different biological samples. The average root-mean-square error (RMSE) of the conventional method and CFPM compared with the ground truth is 5.3% and 5.7%, respectively. Therefore, the acquisition time is significantly reduced by 2/3 with the sacrifice of precision of only 0.4%. And CFPM method is also compatible with advanced fast FPM approaches to reduce computation time further.
In today's world, Neural Style Transfer (NST) has become a trendsetting term. NST combines two pictures, a content picture and a reference image in style (such as the work of a renowned painter) in a way that makes the output image look like an image of the material, but rendered with the form of a reference picture. However, there is no study using the artwork or painting of Bangladeshi painters. Bangladeshi painting has a long history of more than two thousand years and is still being practiced by Bangladeshi painters. This study generates NST stylized image on Bangladeshi paintings and analyzes the human point of view regarding the aesthetic preference of NST on Bangladeshi paintings. To assure our study's acceptance, we performed qualitative human evaluations on generated stylized images by 60 individual humans of different age and gender groups. We have explained how NST works for Bangladeshi paintings and assess NST algorithms, both qualitatively \& quantitatively. Our study acts as a pre-requisite for the impact of NST stylized image using Bangladeshi paintings on mobile UI/GUI and material translation from the human perspective. We hope that this study will encourage new collaborations to create more NST related studies and expand the use of Bangladeshi artworks.
The human visual system is remarkably robust against a wide range of naturally occurring variations and corruptions like rain or snow. In contrast, the performance of modern image recognition models strongly degrades when evaluated on previously unseen corruptions. Here, we demonstrate that a simple but properly tuned training with additive Gaussian and Speckle noise generalizes surprisingly well to unseen corruptions, easily reaching the previous state of the art on the corruption benchmark ImageNet-C (with ResNet50) and on MNIST-C. We build on top of these strong baseline results and show that an adversarial training of the recognition model against uncorrelated worst-case noise distributions leads to an additional increase in performance. This regularization can be combined with previously proposed defense methods for further improvement.
Texture can be defined as the change of image intensity that forms repetitive patterns, resulting from physical properties of the object's roughness or differences in a reflection on the surface. Considering that texture forms a complex system of patterns in a non-deterministic way, biodiversity concepts can help texture characterization in images. This paper proposes a novel approach capable of quantifying such a complex system of diverse patterns through species diversity and richness and taxonomic distinctiveness. The proposed approach considers each image channel as a species ecosystem and computes species diversity and richness measures as well as taxonomic measures to describe the texture. The proposed approach takes advantage of ecological patterns' invariance characteristics to build a permutation, rotation, and translation invariant descriptor. Experimental results on three datasets of natural texture images and two datasets of histopathological images have shown that the proposed texture descriptor has advantages over several texture descriptors and deep methods.
We introduce MultiDepth, a novel training strategy and convolutional neural network (CNN) architecture that allows approaching single-image depth estimation (SIDE) as a multi-task problem. SIDE is an important part of road scene understanding. It, thus, plays a vital role in advanced driver assistance systems and autonomous vehicles. Best results for the SIDE task so far have been achieved using deep CNNs. However, optimization of regression problems, such as estimating depth, is still a challenging task. For the related tasks of image classification and semantic segmentation, numerous CNN-based methods with robust training behavior have been proposed. Hence, in order to overcome the notorious instability and slow convergence of depth value regression during training, MultiDepth makes use of depth interval classification as an auxiliary task. The auxiliary task can be disabled at test-time to predict continuous depth values using the main regression branch more efficiently. We applied MultiDepth to road scenes and present results on the KITTI depth prediction dataset. In experiments, we were able to show that end-to-end multi-task learning with both, regression and classification, is able to considerably improve training and yield more accurate results.
Transformer, which can benefit from global (long-range) information modeling using self-attention mechanisms, has been successful in natural language processing and 2D image classification recently. However, both local and global features are crucial for dense prediction tasks, especially for 3D medical image segmentation. In this paper, we for the first time exploit Transformer in 3D CNN for MRI Brain Tumor Segmentation and propose a novel network named TransBTS based on the encoder-decoder structure. To capture the local 3D context information, the encoder first utilizes 3D CNN to extract the volumetric spatial feature maps. Meanwhile, the feature maps are reformed elaborately for tokens that are fed into Transformer for global feature modeling. The decoder leverages the features embedded by Transformer and performs progressive upsampling to predict the detailed segmentation map. Experimental results on the BraTS 2019 dataset show that TransBTS outperforms state-of-the-art methods for brain tumor segmentation on 3D MRI scans. Code is available at https://github.com/Wenxuan-1119/TransBTS
The vulnerability of deep neural networks (DNNs) for adversarial examples have attracted more attention. Many algorithms are proposed to craft powerful adversarial examples. However, these algorithms modifying the global or local region of pixels without taking into account network explanations. Hence, the perturbations are redundancy and easily detected by human eyes. In this paper, we propose a novel method to generate local region perturbations. The main idea is to find the contributing feature regions (CFRs) of images based on network explanations for perturbations. Due to the network explanations, the perturbations added to the CFRs are more effective than other regions. In our method, a soft mask matrix is designed to represent the CFRs for finely characterizing the contributions of each pixel. Based on this soft mask, we develop a new objective function with inverse temperature to search for optimal perturbations in CFRs. Extensive experiments are conducted on CIFAR-10 and ILSVRC2012, which demonstrate the effectiveness, including attack success rate, imperceptibility,and transferability.