The latest advances in full-reference image quality assessment (IQA) involve unifying structure and texture similarity based on deep representations. The resulting Deep Image Structure and Texture Similarity (DISTS) metric, however, makes rather global quality measurements, ignoring the fact that natural photographic images are locally structured and textured across space and scale. In this paper, we describe a locally adaptive structure and texture similarity index for full-reference IQA, which we term A-DISTS. Specifically, we rely on a single statistical feature, namely the dispersion index, to localize texture regions at different scales. The estimated probability (of one patch being texture) is in turn used to adaptively pool local structure and texture measurements. The resulting A-DISTS is adapted to local image content, and is free of expensive human perceptual scores for supervised training. We demonstrate the advantages of A-DISTS in terms of correlation with human data on ten IQA databases and optimization of single image super-resolution methods.
Existing efforts on Just noticeable difference (JND) estimation mainly dedicate to modeling the visibility masking effects of different factors in spatial and frequency domains, and then fusing them into an overall JND estimate. However, the overall visibility masking effect can be related with more contributing factors beyond those have been considered in the literature and it is also insufficiently accurate to formulate the masking effect even for an individual factor. Moreover, the potential interactions among different masking effects are also difficult to be characterized with a simple fusion model. In this work, we turn to a dramatically different way to address these problems with a top-down design philosophy. Instead of formulating and fusing multiple masking effects in a bottom-up way, the proposed JND estimation model directly generates a critical perceptual lossless (CPL) image from a top-down perspective and calculates the difference map between the original image and the CPL image as the final JND map. Given an input image, an adaptively critical point (perceptual lossless threshold), defined as the minimum number of spectral components in Karhunen-Lo\'{e}ve Transform (KLT) used for perceptual lossless image reconstruction, is derived by exploiting the convergence characteristics of KLT coefficient energy. Then, the CPL image can be reconstructed via inverse KLT according to the derived critical point. Finally, the difference map between the original image and the CPL image is calculated as the JND map. The performance of the proposed JND model is evaluated with two applications including JND-guided noise injection and JND-guided image compression. Experimental results have demonstrated that our proposed JND model can achieve better performance than several latest JND models.
In this paper, we propose a no-reference (NR) image quality assessment (IQA) method via feature level pseudo-reference (PR) hallucination. The proposed quality assessment framework is grounded on the prior models of natural image statistical behaviors and rooted in the view that the perceptually meaningful features could be well exploited to characterize the visual quality. Herein, the PR features from the distorted images are learned by a mutual learning scheme with the pristine reference as the supervision, and the discriminative characteristics of PR features are further ensured with the triplet constraints. Given a distorted image for quality inference, the feature level disentanglement is performed with an invertible neural layer for final quality prediction, leading to the PR and the corresponding distortion features for comparison. The effectiveness of our proposed method is demonstrated on four popular IQA databases, and superior performance on cross-database evaluation also reveals the high generalization capability of our method. The implementation of our method is publicly available on https://github.com/Baoliang93/FPR.
The technological advancements of deep learning have enabled sophisticated face manipulation schemes, raising severe trust issues and security concerns in modern society. Generally speaking, detecting manipulated faces and locating the potentially altered regions are challenging tasks. Herein, we propose a conceptually simple but effective method to efficiently detect forged faces in an image while simultaneously locating the manipulated regions. The proposed scheme relies on a segmentation map that delivers meaningful high-level semantic information clues about the image. Furthermore, a noise map is estimated, playing a complementary role in capturing low-level clues and subsequently empowering decision-making. Finally, the features from these two modules are combined to distinguish fake faces. Extensive experiments show that the proposed model achieves state-of-the-art detection accuracy and remarkable localization performance.
The research of visual signal compression has a long history. Fueled by deep learning, exciting progress has been made recently. Despite achieving better compression performance, existing end-to-end compression algorithms are still designed towards better signal quality in terms of rate-distortion optimization. In this paper, we show that the design and optimization of network architecture could be further improved for compression towards machine vision. We propose an inverted bottleneck structure for end-to-end compression towards machine vision, which specifically accounts for efficient representation of the semantic information. Moreover, we quest the capability of optimization by incorporating the analytics accuracy into the optimization process, and the optimality is further explored with generalized rate-accuracy optimization in an iterative manner. We use object detection as a showcase for end-to-end compression towards machine vision, and extensive experiments show that the proposed scheme achieves significant BD-rate savings in terms of analysis performance. Moreover, the promise of the scheme is also demonstrated with strong generalization capability towards other machine vision tasks, due to the enabling of signal-level reconstruction.
Recent works have proposed methods to train classifiers with local robustness properties, which can provably eliminate classes of evasion attacks for most inputs, but not all inputs. Since data distribution shift is very common in security applications, e.g., often observed for malware detection, local robustness cannot guarantee that the property holds for unseen inputs at the time of deploying the classifier. Therefore, it is more desirable to enforce global robustness properties that hold for all inputs, which is strictly stronger than local robustness. In this paper, we present a framework and tools for training classifiers that satisfy global robustness properties. We define new notions of global robustness that are more suitable for security classifiers. We design a novel booster-fixer training framework to enforce global robustness properties. We structure our classifier as an ensemble of logic rules and design a new verifier to verify the properties. In our training algorithm, the booster increases the classifier's capacity, and the fixer enforces verified global robustness properties following counterexample guided inductive synthesis. To the best of our knowledge, the only global robustness property that has been previously achieved is monotonicity. Several previous works have defined global robustness properties, but their training techniques failed to achieve verified global robustness. In comparison, we show that we can train classifiers to satisfy different global robustness properties for three security datasets, and even multiple properties at the same time, with modest impact on the classifier's performance. For example, we train a Twitter spam account classifier to satisfy five global robustness properties, with 5.4% decrease in true positive rate, and 0.1% increase in false positive rate, compared to a baseline XGBoost model that doesn't satisfy any property.
Deep neural networks (DNN) have demonstrated unprecedented success for medical imaging applications. However, due to the issue of limited dataset availability and the strict legal and ethical requirements for patient privacy protection, the broad applications of medical imaging classification driven by DNN with large-scale training data have been largely hindered. For example, when training the DNN from one domain (e.g., with data only from one hospital), the generalization capability to another domain (e.g., data from another hospital) could be largely lacking. In this paper, we aim to tackle this problem by developing the privacy-preserving constrained domain generalization method, aiming to improve the generalization capability under the privacy-preserving condition. In particular, We propose to improve the information aggregation process on the centralized server-side with a novel gradient alignment loss, expecting that the trained model can be better generalized to the "unseen" but related medical images. The rationale and effectiveness of our proposed method can be explained by connecting our proposed method with the Maximum Mean Discrepancy (MMD) which has been widely adopted as the distribution distance measurement. Experimental results on two challenging medical imaging classification tasks indicate that our method can achieve better cross-domain generalization capability compared to the state-of-the-art federated learning methods.
Recent works in neural network verification show that cheap incomplete verifiers such as CROWN, based upon bound propagations, can effectively be used in Branch-and-Bound (BaB) methods to accelerate complete verification, achieving significant speedups compared to expensive linear programming (LP) based techniques. However, they cannot fully handle the per-neuron split constraints introduced by BaB like LP verifiers do, leading to looser bounds and hurting their verification efficiency. In this work, we develop $\beta$-CROWN, a new bound propagation based method that can fully encode per-neuron splits via optimizable parameters $\beta$. When the optimizable parameters are jointly optimized in intermediate layers, $\beta$-CROWN has the potential of producing better bounds than typical LP verifiers with neuron split constraints, while being efficiently parallelizable on GPUs. Applied to the complete verification setting, $\beta$-CROWN is close to three orders of magnitude faster than LP-based BaB methods for robustness verification, and also over twice faster than state-of-the-art GPU-based complete verifiers with similar timeout rates. By terminating BaB early, our method can also be used for incomplete verification. Compared to the state-of-the-art semidefinite-programming (SDP) based verifier, we show a substantial leap forward by greatly reducing the gap between verified accuracy and empirical adversarial attack accuracy, from 35% (SDP) to 12% on an adversarially trained MNIST network ($\epsilon=0.3$), while being 47 times faster. Our code is available at https://github.com/KaidiXu/Beta-CROWN
This paper focuses on the domain generalization task where domain knowledge is unavailable, and even worse, only samples from a single domain can be utilized during training. Our motivation originates from the recent progresses in deep neural network (DNN) testing, which has shown that maximizing neuron coverage of DNN can help to explore possible defects of DNN (i.e., misclassification). More specifically, by treating the DNN as a program and each neuron as a functional point of the code, during the network training we aim to improve the generalization capability by maximizing the neuron coverage of DNN with the gradient similarity regularization between the original and augmented samples. As such, the decision behavior of the DNN is optimized, avoiding the arbitrary neurons that are deleterious for the unseen samples, and leading to the trained DNN that can be better generalized to out-of-distribution samples. Extensive studies on various domain generalization tasks based on both single and multiple domain(s) setting demonstrate the effectiveness of our proposed approach compared with state-of-the-art baseline methods. We also analyze our method by conducting visualization based on network dissection. The results further provide useful evidence on the rationality and effectiveness of our approach.