This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These output images have completely different characteristics from traditional distortions, thus pose a new challenge for IQA methods to evaluate their visual quality. In comparison with previous IQA challenges, the training and testing datasets in this challenge include the outputs of perceptual image processing algorithms and the corresponding subjective scores. Thus they can be used to develop and evaluate IQA methods on GAN-based distortions. The challenge has 270 registered participants in total. In the final testing stage, 13 participating teams submitted their models and fact sheets. Almost all of them have achieved much better results than existing IQA methods, while the winning method can demonstrate state-of-the-art performance.
This work considers a novel information design problem and studies how the craft of payoff-relevant environmental signals solely can influence the behaviors of intelligent agents. The agents' strategic interactions are captured by an incomplete-information Markov game, in which each agent first selects one environmental signal from multiple signal sources as additional payoff-relevant information and then takes an action. There is a rational information designer (principal) who possesses one signal source and aims to control the equilibrium behaviors of the agents by designing the information structure of her signals sent to the agents. An obedient principle is established which states that it is without loss of generality to focus on the direct information design when the information design incentivizes each agent to select the signal sent by the principal, such that the design process avoids the predictions of the agents' strategic selection behaviors. Based on the obedient principle, we introduce the design protocol given a goal of the principal referred to as obedient implementability (OIL) and study a Myersonian information design that characterizes the OIL in a class of obedient sequential Markov perfect Bayesian equilibria (O-SMPBE). A framework is proposed based on an approach which we refer to as the fixed-point alignment that incentivizes the agents to choose the signal sent by the principal, makes sure that the agents' policy profile of taking actions is the policy component of an O-SMPBE, and the principal's goal is achieved. The proposed approach can be applied to elicit desired behaviors of multi-agent systems in competing as well as cooperating settings and be extended to heterogeneous stochastic games in the complete- and the incomplete-information environments.
In temporal action localization methods, temporal downsampling operations are widely used to extract proposal features, but they often lead to the aliasing problem, due to lacking consideration of sampling rates. This paper aims to verify the existence of aliasing in TAL methods and investigate utilizing low pass filters to solve this problem by inhibiting the high-frequency band. However, the high-frequency band usually contains large amounts of specific information, which is important for model inference. Therefore, it is necessary to make a tradeoff between anti-aliasing and reserving high-frequency information. To acquire optimal performance, this paper learns different cutoff frequencies for different instances dynamically. This design can be plugged into most existing temporal modeling programs requiring only one additional cutoff frequency parameter. Integrating low pass filters to the downsampling operations significantly improves the detection performance and achieves comparable results on THUMOS'14, ActivityNet~1.3, and Charades datasets. Experiments demonstrate that anti-aliasing with low pass filters in TAL is advantageous and efficient.
Deep learning models (with neural networks) have been widely used in challenging tasks such as computer-aided disease diagnosis based on medical images. Recent studies have shown deep diagnostic models may not be robust in the inference process and may pose severe security concerns in clinical practice. Among all the factors that make the model not robust, the most serious one is adversarial examples. The so-called "adversarial example" is a well-designed perturbation that is not easily perceived by humans but results in a false output of deep diagnostic models with high confidence. In this paper, we evaluate the robustness of deep diagnostic models by adversarial attack. Specifically, we have performed two types of adversarial attacks to three deep diagnostic models in both single-label and multi-label classification tasks, and found that these models are not reliable when attacked by adversarial example. We have further explored how adversarial examples attack the models, by analyzing their quantitative classification results, intermediate features, discriminability of features and correlation of estimated labels for both original/clean images and those adversarial ones. We have also designed two new defense methods to handle adversarial examples in deep diagnostic models, i.e., Multi-Perturbations Adversarial Training (MPAdvT) and Misclassification-Aware Adversarial Training (MAAdvT). The experimental results have shown that the use of defense methods can significantly improve the robustness of deep diagnostic models against adversarial attacks.
This work considers a novel information design problem and studies how the craft of payoff-relevant environmental signals solely can influence the behaviors of intelligent agents. The agents' strategic interactions are captured by an incomplete-information Markov game, in which each agent first selects one environmental signal from multiple signal sources as additional payoff-relevant information and then takes an action. There is a rational information designer (designer) who possesses one signal source and aims to control the equilibrium behaviors of the agents by designing the information structure of her signals sent to the agents. An obedient principle is established which states that it is without loss of generality to focus on the direct information design when the information design incentivizes each agent to select the signal sent by the designer, such that the design process avoids the predictions of the agents' strategic selection behaviors. We then introduce the design protocol given a goal of the designer referred to as obedient implementability (OIL) and characterize the OIL in a class of obedient perfect Bayesian Markov Nash equilibria (O-PBME). A new framework for information design is proposed based on an approach of maximizing the optimal slack variables. Finally, we formulate the designer's goal selection problem and characterize it in terms of information design by establishing a relationship between the O-PBME and the Bayesian Markov correlated equilibria, in which we build upon the revelation principle in classic information design in economics. The proposed approach can be applied to elicit desired behaviors of multi-agent systems in competing as well as cooperating settings and be extended to heterogeneous stochastic games in the complete- and the incomplete-information environments.
This paper introduces a new deep learning approach to approximately solve the Covering Salesman Problem (CSP). In this approach, given the city locations of a CSP as input, a deep neural network model is designed to directly output the solution. It is trained using the deep reinforcement learning without supervision. Specifically, in the model, we apply the Multi-head Attention to capture the structural patterns, and design a dynamic embedding to handle the dynamic patterns of the problem. Once the model is trained, it can generalize to various types of CSP tasks (different sizes and topologies) with no need of re-training. Through controlled experiments, the proposed approach shows desirable time complexity: it runs more than 20 times faster than the traditional heuristic solvers with a tiny gap of optimality. Moreover, it significantly outperforms the current state-of-the-art deep learning approaches for combinatorial optimization in the aspect of both training and inference. In comparison with traditional solvers, this approach is highly desirable for most of the challenging tasks in practice that are usually large-scale and require quick decisions.
A range of defense methods have been proposed to improve the robustness of neural networks on adversarial examples, among which provable defense methods have been demonstrated to be effective to train neural networks that are certifiably robust to the attacker. However, most of these provable defense methods treat all examples equally during training process, which ignore the inconsistent constraint of certified robustness between correctly classified (natural) and misclassified examples. In this paper, we explore this inconsistency caused by misclassified examples and add a novel consistency regularization term to make better use of the misclassified examples. Specifically, we identified that the certified robustness of network can be significantly improved if the constraint of certified robustness on misclassified examples and correctly classified examples is consistent. Motivated by this discovery, we design a new defense regularization term called Misclassification Aware Adversarial Regularization (MAAR), which constrains the output probability distributions of all examples in the certified region of the misclassified example. Experimental results show that our proposed MAAR achieves the best certified robustness and comparable accuracy on CIFAR-10 and MNIST datasets in comparison with several state-of-the-art methods.
X-ray scatter remains a major physics challenge in volumetric computed tomography (CT), whose physical and statistical behaviors have been commonly leveraged in order to eliminate its impact on CT image quality. In this work, we conduct an in-depth derivation of how the scatter distribution and scatter to primary ratio (SPR) will change during the spectral correction, leading to an interesting finding on the property of scatter: when applying the spectral correction before scatter is removed, the impact of SPR on a CT projection will be scaled by the first derivative of the mapping function; while the scatter distribution in the transmission domain will be scaled by the product of the first derivative of the mapping function and a natural exponential of the projection difference before and after the mapping. Such a characterization of scatter's behavior provides an analytic approach of compensating for the SPR as well as approximating the change of scatter distribution after spectral correction, even though both of them might be significantly distorted as the linearization mapping function in spectral correction could vary a lot from one detector pixel to another. We conduct an evaluation of SPR compensations on a Catphan phantom and an anthropomorphic chest phantom to validate the characteristics of scatter. In addition, this scatter property is also directly adopted into CT imaging using a spectral modulator with flying focal spot technology (SMFFS) as an example to demonstrate its potential in practical applications.
Visual-tactile fused sensing for object clustering has achieved significant progresses recently, since the involvement of tactile modality can effectively improve clustering performance. However, the missing data (i.e., partial data) issues always happen due to occlusion and noises during the data collecting process. This issue is not well solved by most existing partial multi-view clustering methods for the heterogeneous modality challenge. Naively employing these methods would inevitably induce a negative effect and further hurt the performance. To solve the mentioned challenges, we propose a Generative Partial Visual-Tactile Fused (i.e., GPVTF) framework for object clustering. More specifically, we first do partial visual and tactile features extraction from the partial visual and tactile data, respectively, and encode the extracted features in modality-specific feature subspaces. A conditional cross-modal clustering generative adversarial network is then developed to synthesize one modality conditioning on the other modality, which can compensate missing samples and align the visual and tactile modalities naturally by adversarial learning. To the end, two pseudo-label based KL-divergence losses are employed to update the corresponding modality-specific encoders. Extensive comparative experiments on three public visual-tactile datasets prove the effectiveness of our method.