Semi-supervised learning (SSL) has been proven to be a powerful method for leveraging unlabelled data to alleviate models' dependence on large labelled datasets. The common framework among recent approaches is to train the model on a large amount of unlabelled data with consistency regularization to constrain the model predictions to be invariant to input perturbation. However, the existing SSL frameworks still have room for improvement in the consistency regularization method. Instead of regularizing category predictions in the label space as in existing frameworks, this paper proposes a feature space renormalization (FSR) mechanism for SSL. First, we propose a feature space renormalization mechanism to substitute for the commonly used consistency regularization mechanism to learn better discriminative features. To apply this mechanism, we start by building a basic model and an empirical model and then introduce our mechanism to renormalize the feature learning of the basic model with the guidance of the empirical model. Second, we combine the proposed mechanism with pseudo-labelling to obtain a novel effective SSL model named FreMatch. The experimental results show that our method can achieve better performance on a variety of standard SSL benchmark datasets, and the proposed feature space renormalization mechanism can also enhance the performance of other SSL approaches.
Adversarial examples pose a security threat to many critical systems built on neural networks (such as face recognition systems, and self-driving cars). While many methods have been proposed to build robust models, how to build certifiably robust yet accurate neural network models remains an open problem. For example, adversarial training improves empirical robustness, but they do not provide certification of the model's robustness. On the other hand, certified training provides certified robustness but at the cost of a significant accuracy drop. In this work, we propose a novel approach that aims to achieve both high accuracy and certified probabilistic robustness. Our method has two parts, i.e., a probabilistic robust training method with an additional goal of minimizing variance in terms of divergence and a runtime inference method for certified probabilistic robustness of the prediction. The latter enables efficient certification of the model's probabilistic robustness at runtime with statistical guarantees. This is supported by our training objective, which minimizes the variance of the model's predictions in a given vicinity, derived from a general definition of model robustness. Our approach works for a variety of perturbations and is reasonably efficient. Our experiments on multiple models trained on different datasets demonstrate that our approach significantly outperforms existing approaches in terms of both certification rate and accuracy.
This paper addresses the issues of controlling and analyzing the population diversity in quantum-behaved particle swarm optimization (QPSO), which is an optimization approach motivated by concepts in quantum mechanics and PSO. In order to gain an in-depth understanding of the role the diversity plays in the evolving process, we first define the genotype diversity by the distance to the average point of the particles' positions and the phenotype diversity by the fitness values for the QPSO. Then, the correlations between the two types of diversities and the search performance are tested and analyzed on several benchmark functions, and the distance-to-average-point diversity is showed to have stronger association with the search performance during the evolving processes. Finally, in the light of the performed diversity analyses, two strategies for controlling the distance-to-average-point diversities are proposed for the purpose of improving the search ability of the QPSO algorithm. Empirical studies on the QPSO with the introduced diversity control methods are performed on a set of benchmark functions from the CEC 2005 benchmark suite. The performance of the proposed methods are evaluated and compared with the original QPSO and other PSO variants.
Deepfake videos and images are becoming increasingly credible, posing a significant threat given their potential to facilitate fraud or bypass access control systems. This has motivated the development of deepfake detection methods, in which deep learning models are trained to distinguish between real and synthesized footage. Unfortunately, existing detection models struggle to generalize to deepfakes from datasets they were not trained on, but little work has been done to examine why or how this limitation can be addressed. In this paper, we present the first empirical study on the generalizability of deepfake detectors, an essential goal for detectors to stay one step ahead of attackers. Our study utilizes six deepfake datasets, five deepfake detection methods, and two model augmentation approaches, confirming that detectors do not generalize in zero-shot settings. Additionally, we find that detectors are learning unwanted properties specific to synthesis methods and struggling to extract discriminative features, limiting their ability to generalize. Finally, we find that there are neurons universally contributing to detection across seen and unseen datasets, illuminating a possible path forward to zero-shot generalizability.
Machine learning is widely used to make decisions with societal impact such as bank loan approving, criminal sentencing, and resume filtering. How to ensure its fairness while maintaining utility is a challenging but crucial issue. Fairness is a complex and context-dependent concept with over 70 different measurement metrics. Since existing regulations are often vague in terms of which metric to use and different organizations may prefer different fairness metrics, it is important to have means of improving fairness comprehensively. Existing mitigation techniques often target at one specific fairness metric and have limitations in improving multiple notions of fairness simultaneously. In this work, we propose CFU (Comprehensive Fairness-Utility), a reinforcement learning-based framework, to efficiently improve the fairness-utility trade-off in machine learning classifiers. A comprehensive measurement that can simultaneously consider multiple fairness notions as well as utility is established, and new metrics are proposed based on an in-depth analysis of the relationship between different fairness metrics. The reward function of CFU is constructed with comprehensive measurement and new metrics. We conduct extensive experiments to evaluate CFU on 6 tasks, 3 machine learning models, and 15 fairness-utility measurements. The results demonstrate that CFU can improve the classifier on multiple fairness metrics without sacrificing its utility. It outperforms all state-of-the-art techniques and has witnessed a 37.5% improvement on average.
Recently, neural networks have spread into numerous fields including many safety-critical systems. Neural networks are built (and trained) by programming in frameworks such as TensorFlow and PyTorch. Developers apply a rich set of pre-defined layers to manually program neural networks or to automatically generate them (e.g., through AutoML). Composing neural networks with different layers is error-prone due to the non-trivial constraints that must be satisfied in order to use those layers. In this work, we propose an approach to automatically repair erroneous neural networks. The challenge is in identifying a minimal modification to the network so that it becomes valid. Modifying a layer might have cascading effects on subsequent layers and thus our approach must search recursively to identify a "globally" minimal modification. Our approach is based on an executable semantics of deep learning layers and focuses on four kinds of errors which are common in practice. We evaluate our approach for two usage scenarios, i.e., repairing automatically generated neural networks and manually written ones suffering from common model bugs. The results show that we are able to repair 100% of a set of randomly generated neural networks (which are produced with an existing AI framework testing approach) effectively and efficiently (with an average repair time of 21.08s) and 93.75% of a collection of real neural network bugs (with an average time of 3min 40s).
Most existing studies on massive grant-free access, proposed to support massive machine-type communications (mMTC) for the Internet of things (IoT), assume Rayleigh fading and perfect synchronization for simplicity. However, in practice, line-of-sight (LoS) components generally exist, and time and frequency synchronization are usually imperfect. This paper systematically investigates maximum likelihood estimation (MLE)-based device activity detection under Rician fading for massive grant-free access with perfect and imperfect synchronization. Specifically, we formulate device activity detection in the synchronous case and joint device activity and offset detection in three asynchronous cases (i.e., time, frequency, and time and frequency asynchronous cases) as MLE problems. In the synchronous case, we propose an iterative algorithm to obtain a stationary point of the MLE problem. In each asynchronous case, we propose two iterative algorithms with identical detection performance but different computational complexities. In particular, one is computationally efficient for small ranges of offsets, whereas the other one, relying on fast Fourier transform (FFT) and inverse FFT, is computationally efficient for large ranges of offsets. The proposed algorithms generalize the existing MLE-based methods for Rayleigh fading and perfect synchronization. Numerical results show the notable gains of the proposed algorithms over existing methods in detection accuracy and computation time.
Overparameterization constitutes one of the most significant hallmarks of deep neural networks. Though it can offer the advantage of outstanding generalization performance, it meanwhile imposes substantial storage burden, thus necessitating the study of network pruning. A natural and fundamental question is: How sparse can we prune a deep network (with almost no hurt on the performance)? To address this problem, in this work we take a first principles approach, specifically, by merely enforcing the sparsity constraint on the original loss function, we're able to characterize the sharp phase transition point of pruning ratio, which corresponds to the boundary between the feasible and the infeasible, from the perspective of high-dimensional geometry. It turns out that the phase transition point of pruning ratio equals the squared Gaussian width of some convex body resulting from the $l_1$-regularized loss function, normalized by the original dimension of parameters. As a byproduct, we provide a novel network pruning algorithm which is essentially a global one-shot pruning one. Furthermore, we provide efficient countermeasures to address the challenges in computing the involved Gaussian width, including the spectrum estimation of a large-scale Hessian matrix and dealing with the non-definite positiveness of a Hessian matrix. It is demonstrated that the predicted pruning ratio threshold coincides very well with the actual value obtained from the experiments and our proposed pruning algorithm can achieve competitive or even better performance than the existing pruning algorithms. All codes are available at: https://github.com/QiaozheZhang/Global-One-shot-Pruning
We describe SynGraphy, a method for visually summarising the structure of large network datasets that works by drawing smaller graphs generated to have similar structural properties to the input graphs. Visualising complex networks is crucial to understand and make sense of networked data and the relationships it represents. Due to the large size of many networks, visualisation is extremely difficult; the simple method of drawing large networks like those of Facebook or Twitter leads to graphics that convey little or no information. While modern graph layout algorithms can scale computationally to large networks, their output tends to a common "hairball" look, which makes it difficult to even distinguish different graphs from each other. Graph sampling and graph coarsening techniques partially address these limitations but they are only able to preserve a subset of the properties of the original graphs. In this paper we take the problem of visualising large graphs from a novel perspective: we leave the original graph's nodes and edges behind, and instead summarise its properties such as the clustering coefficient and bipartivity by generating a completely new graph whose structural properties match that of the original graph. To verify the utility of this approach as compared to other graph visualisation algorithms, we perform an experimental evaluation in which we repeatedly asked experimental subjects (professionals in graph mining and related areas) to determine which of two given graphs has a given structural property and then assess which visualisation algorithm helped in identifying the correct answer. Our summarisation approach SynGraphy compares favourably to other techniques on a variety of networks.
Deep learning has become a promising programming paradigm in software development, owing to its surprising performance in solving many challenging tasks. Deep neural networks (DNNs) are increasingly being deployed in practice, but are limited on resource-constrained devices owing to their demand for computational power. Quantization has emerged as a promising technique to reduce the size of DNNs with comparable accuracy as their floating-point numbered counterparts. The resulting quantized neural networks (QNNs) can be implemented energy-efficiently. Similar to their floating-point numbered counterparts, quality assurance techniques for QNNs, such as testing and formal verification, are essential but are currently less explored. In this work, we propose a novel and efficient formal verification approach for QNNs. In particular, we are the first to propose an encoding that reduces the verification problem of QNNs into the solving of integer linear constraints, which can be solved using off-the-shelf solvers. Our encoding is both sound and complete. We demonstrate the application of our approach on local robustness verification and maximum robustness radius computation. We implement our approach in a prototype tool QVIP and conduct a thorough evaluation. Experimental results on QNNs with different quantization bits confirm the effectiveness and efficiency of our approach, e.g., two orders of magnitude faster and able to solve more verification tasks in the same time limit than the state-of-the-art methods.