A wide variety of adversarial attacks have been proposed and explored using image and audio data. These attacks are notoriously easy to generate digitally when the attacker can directly manipulate the input to a model, but are much more difficult to implement in the real-world. In this paper we present a universal, time invariant attack for general time series data such that the attack has a frequency spectrum primarily composed of the frequencies present in the original data. The universality of the attack makes it fast and easy to implement as no computation is required to add it to an input, while time invariance is useful for real-world deployment. Additionally, the frequency constraint ensures the attack can withstand filtering. We demonstrate the effectiveness of the attack in two different domains, speech recognition and unintended radiated emission, and show that the attack is robust against common transform-and-compare defense pipelines.
Deep learning approaches achieve state-of-the-art performance for classifying radiology images, but rely on large labelled datasets that require resource-intensive annotation by specialists. Both semi-supervised learning and active learning can be utilised to mitigate this annotation burden. However, there is limited work on combining the advantages of semi-supervised and active learning approaches for multi-label medical image classification. Here, we introduce a novel Consistency-based Semi-supervised Evidential Active Learning framework (CSEAL). Specifically, we leverage predictive uncertainty based on theories of evidence and subjective logic to develop an end-to-end integrated approach that combines consistency-based semi-supervised learning with uncertainty-based active learning. We apply our approach to enhance four leading consistency-based semi-supervised learning methods: Pseudo-labelling, Virtual Adversarial Training, Mean Teacher and NoTeacher. Extensive evaluations on multi-label Chest X-Ray classification tasks demonstrate that CSEAL achieves substantive performance improvements over two leading semi-supervised active learning baselines. Further, a class-wise breakdown of results shows that our approach can substantially improve accuracy on rarer abnormalities with fewer labelled samples.
Vision Transformer are very popular nowadays due to their state-of-the-art performance in several computer vision tasks, such as image classification and action recognition. Although the performance of Vision Transformers have been greatly improved by employing Convolutional Neural Networks, hierarchical structures and compact forms, there is limited research on ways to utilize additional data representations to refine the attention map derived from the multi-head attention of a Transformer network. This work proposes a novel attention mechanism, called multi-manifold attention, that can substitute any standard attention mechanism in a Transformer-based network. The proposed attention models the input space in three distinct manifolds, namely Euclidean, Symmetric Positive Definite and Grassmann, with different statistical and geometrical properties, guiding the network to take into consideration a rich set of information that describe the appearance, color and texture of an image, for the computation of a highly descriptive attention map. In this way, a Vision Transformer with the proposed attention is guided to become more attentive towards discriminative features, leading to improved classification results, as shown by the experimental results on several well-known image classification datasets.
During the past years,deep convolutional neural networks have achieved impressive success in low-light Image Enhancement.Existing deep learning methods mostly enhance the ability of feature extraction by stacking network structures and deepening the depth of the network.which causes more runtime cost on single image.In order to reduce inference time while fully extracting local features and global features.Inspired by SGN,we propose a Attention based Broadly self-guided network (ABSGN) for real world low-light image Enhancement.such a broadly strategy is able to handle the noise at different exposures.The proposed network is validated by many mainstream benchmark.Additional experimental results show that the proposed network outperforms most of state-of-the-art low-light image Enhancement solutions.
Anomaly detection with only prior knowledge from normal samples attracts more attention because of the lack of anomaly samples. Existing CNN-based pixel reconstruction approaches suffer from two concerns. First, the reconstruction source and target are raw pixel values that contain indistinguishable semantic information. Second, CNN tends to reconstruct both normal samples and anomalies well, making them still hard to distinguish. In this paper, we propose Anomaly Detection TRansformer (ADTR) to apply a transformer to reconstruct pre-trained features. The pre-trained features contain distinguishable semantic information. Also, the adoption of transformer limits to reconstruct anomalies well such that anomalies could be detected easily once the reconstruction fails. Moreover, we propose novel loss functions to make our approach compatible with the normal-sample-only case and the anomaly-available case with both image-level and pixel-level labeled anomalies. The performance could be further improved by adding simple synthetic or external irrelevant anomalies. Extensive experiments are conducted on anomaly detection datasets including MVTec-AD and CIFAR-10. Our method achieves superior performance compared with all baselines.
Shape reconstruction of deformable organs from two-dimensional X-ray images is a key technology for image-guided intervention. In this paper, we propose an image-to-graph convolutional network (IGCN) for deformable shape reconstruction from a single-viewpoint projection image. The IGCN learns relationship between shape/deformation variability and the deep image features based on a deformation mapping scheme. In experiments targeted to the respiratory motion of abdominal organs, we confirmed the proposed framework with a regularized loss function can reconstruct liver shapes from a single digitally reconstructed radiograph with a mean distance error of 3.6mm.
Previous image forensics schemes for crop detection are only limited on predicting whether an image has been cropped. This paper presents a novel scheme for image crop localization using robust watermarking. We further extend our scheme to detect tampering attack on the attacked image. We demonstrate that our scheme is the first to provide high-accuracy and robust image crop localization. Besides, the accuracy of tamper detection is comparable to many state-of-the-art methods.
Metal artifact correction is a challenging problem in cone beam computed tomography (CBCT) scanning. Metal implants inserted into the anatomy cause severe artifacts in reconstructed images. Widely used inpainting-based metal artifact reduction (MAR) methods require segmentation of metal traces in the projections as a first step which is a challenging task. One approach is to use a deep learning method to segment metals in the projections. However, the success of deep learning methods is limited by the availability of realistic training data. It is challenging and time consuming to get reliable ground truth annotations due to unclear implant boundary and large number of projections. We propose to use X-ray simulations to generate synthetic metal segmentation training dataset from clinical CBCT scans. We compare the effect of simulations with different number of photons and also compare several training strategies to augment the available data. We compare our model's performance on real clinical scans with conventional threshold-based MAR and a recent deep learning method. We show that simulations with relatively small number of photons are suitable for the metal segmentation task and that training the deep learning model with full size and cropped projections together improves the robustness of the model. We show substantial improvement in the image quality affected by severe motion, voxel size under-sampling, and out-of-FOV metals. Our method can be easily implemented into the existing projection-based MAR pipeline to get improved image quality. This method can provide a novel paradigm to accurately segment metals in CBCT projections.
This paper considers few-shot anomaly detection (FSAD), a practical yet under-studied setting for anomaly detection (AD), where only a limited number of normal images are provided for each category at training. So far, existing FSAD studies follow the one-model-per-category learning paradigm used for standard AD, and the inter-category commonality has not been explored. Inspired by how humans detect anomalies, i.e., comparing an image in question to normal images, we here leverage registration, an image alignment task that is inherently generalizable across categories, as the proxy task, to train a category-agnostic anomaly detection model. During testing, the anomalies are identified by comparing the registered features of the test image and its corresponding support (normal) images. As far as we know, this is the first FSAD method that trains a single generalizable model and requires no re-training or parameter fine-tuning for new categories. Experimental results have shown that the proposed method outperforms the state-of-the-art FSAD methods by 3%-8% in AUC on the MVTec and MPDD benchmarks.
Memes are prevalent on the internet and continue to grow and evolve alongside our culture. An automatic understanding of memes propagating on the internet can shed light on the general sentiment and cultural attitudes of people. In this work, we present team BLUE's solution for the second edition of the MEMOTION competition. We showcase two approaches for meme classification (i.e. sentiment, humour, offensive, sarcasm and motivation levels) using a text-only method using BERT, and a Multi-Modal-Multi-Task transformer network that operates on both the meme image and its caption to output the final scores. In both approaches, we leverage state-of-the-art pretrained models for text (BERT, Sentence Transformer) and image processing (EfficientNetV4, CLIP). Through our efforts, we obtain first place in task A, second place in task B and third place in task C. In addition, our team obtained the highest average score for all three tasks.