Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaolin Huang

MUSO: Achieving Exact Machine Unlearning in Over-Parameterized Regimes

Oct 11, 2024

Ruikai Yang, Mingzhen He, Zhengbao He, Youmei Qiu, Xiaolin Huang

Figure 1 for MUSO: Achieving Exact Machine Unlearning in Over-Parameterized Regimes

Figure 2 for MUSO: Achieving Exact Machine Unlearning in Over-Parameterized Regimes

Figure 3 for MUSO: Achieving Exact Machine Unlearning in Over-Parameterized Regimes

Abstract:Machine unlearning (MU) is to make a well-trained model behave as if it had never been trained on specific data. In today's over-parameterized models, dominated by neural networks, a common approach is to manually relabel data and fine-tune the well-trained model. It can approximate the MU model in the output space, but the question remains whether it can achieve exact MU, i.e., in the parameter space. We answer this question by employing random feature techniques to construct an analytical framework. Under the premise of model optimization via stochastic gradient descent, we theoretically demonstrated that over-parameterized linear models can achieve exact MU through relabeling specific data. We also extend this work to real-world nonlinear networks and propose an alternating optimization algorithm that unifies the tasks of unlearning and relabeling. The algorithm's effectiveness, confirmed through numerical experiments, highlights its superior performance in unlearning across various scenarios compared to current state-of-the-art methods, particularly excelling over similar relabeling-based MU approaches.

Via

Access Paper or Ask Questions

Unified Gradient-Based Machine Unlearning with Remain Geometry Enhancement

Sep 29, 2024

Zhehao Huang, Xinwen Cheng, JingHao Zheng, Haoran Wang, Zhengbao He, Tao Li, Xiaolin Huang

Figure 1 for Unified Gradient-Based Machine Unlearning with Remain Geometry Enhancement

Figure 2 for Unified Gradient-Based Machine Unlearning with Remain Geometry Enhancement

Figure 3 for Unified Gradient-Based Machine Unlearning with Remain Geometry Enhancement

Figure 4 for Unified Gradient-Based Machine Unlearning with Remain Geometry Enhancement

Abstract:Machine unlearning (MU) has emerged to enhance the privacy and trustworthiness of deep neural networks. Approximate MU is a practical method for large-scale models. Our investigation into approximate MU starts with identifying the steepest descent direction, minimizing the output Kullback-Leibler divergence to exact MU inside a parameters' neighborhood. This probed direction decomposes into three components: weighted forgetting gradient ascent, fine-tuning retaining gradient descent, and a weight saliency matrix. Such decomposition derived from Euclidean metric encompasses most existing gradient-based MU methods. Nevertheless, adhering to Euclidean space may result in sub-optimal iterative trajectories due to the overlooked geometric structure of the output probability space. We suggest embedding the unlearning update into a manifold rendered by the remaining geometry, incorporating second-order Hessian from the remaining data. It helps prevent effective unlearning from interfering with the retained performance. However, computing the second-order Hessian for large-scale models is intractable. To efficiently leverage the benefits of Hessian modulation, we propose a fast-slow parameter update strategy to implicitly approximate the up-to-date salient unlearning direction. Free from specific modal constraints, our approach is adaptable across computer vision unlearning tasks, including classification and generation. Extensive experiments validate our efficacy and efficiency. Notably, our method successfully performs class-forgetting on ImageNet using DiT and forgets a class on CIFAR-10 using DDPM in just 50 steps, compared to thousands of steps required by previous methods.

* Accepted by NeurIPS 2024 as a Spotlight paper

Via

Access Paper or Ask Questions

Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape

Sep 22, 2024

Tao Li, Zhengbao He, Yujun Li, Yasheng Wang, Lifeng Shang, Xiaolin Huang

Figure 1 for Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape

Figure 2 for Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape

Figure 3 for Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape

Figure 4 for Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape

Abstract:Fine-tuning large-scale pre-trained models is prohibitively expensive in terms of computational and memory costs. Low-Rank Adaptation (LoRA), a popular Parameter-Efficient Fine-Tuning (PEFT) method, provides an efficient way to fine-tune models by optimizing only a low-rank matrix. Despite recent progress made in improving LoRA's performance, the connection between the LoRA optimization space and the original full parameter space is often overlooked. A solution that appears flat in the LoRA space may exist sharp directions in the full parameter space, potentially harming generalization performance. In this paper, we propose Flat-LoRA, an efficient approach that seeks a low-rank adaptation located in a flat region of the full parameter space.Instead of relying on the well-established sharpness-aware minimization approach, which can incur significant computational and memory burdens, we utilize random weight perturbation with a Bayesian expectation loss objective to maintain training efficiency and design a refined perturbation generation strategy for improved performance. Experiments on natural language processing and image classification tasks with various architectures demonstrate the effectiveness of our approach.

* Work in progress

Via

Access Paper or Ask Questions

DDoS: Diffusion Distribution Similarity for Out-of-Distribution Detection

Sep 16, 2024

Kun Fang, Qinghua Tao, Zuopeng Yang, Xiaolin Huang, Jie Yang

Figure 1 for DDoS: Diffusion Distribution Similarity for Out-of-Distribution Detection

Figure 2 for DDoS: Diffusion Distribution Similarity for Out-of-Distribution Detection

Figure 3 for DDoS: Diffusion Distribution Similarity for Out-of-Distribution Detection

Figure 4 for DDoS: Diffusion Distribution Similarity for Out-of-Distribution Detection

Abstract:Out-of-Distribution (OoD) detection determines whether the given samples are from the training distribution of the classifier-under-protection, i.e., the In-Distribution (InD), or from a different OoD. Latest researches introduce diffusion models pre-trained on InD data to advocate OoD detection by transferring an OoD image into a generated one that is close to InD, so that one could capture the distribution disparities between original and generated images to detect OoD data. Existing diffusion-based detectors adopt perceptual metrics on the two images to measure such disparities, but ignore a fundamental fact: Perceptual metrics are devised essentially for human-perceived similarities of low-level image patterns, e.g., textures and colors, and are not advisable in evaluating distribution disparities, since images with different low-level patterns could possibly come from the same distribution. To address this issue, we formulate a diffusion-based detection framework that considers the distribution similarity between a tested image and its generated counterpart via a novel proper similarity metric in the informative feature space and probability space learned by the classifier-under-protection. An anomaly-removal strategy is further presented to enlarge such distribution disparities by removing abnormal OoD information in the feature space to facilitate the detection. Extensive empirical results unveil the insufficiency of perceptual metrics and the effectiveness of our distribution similarity framework with new state-of-the-art detection performance.

Via

Access Paper or Ask Questions

Scalable Learned Model Soup on a Single GPU: An Efficient Subspace Training Strategy

Jul 04, 2024

Tao Li, Weisen Jiang, Fanghui Liu, Xiaolin Huang, James T. Kwok

Figure 1 for Scalable Learned Model Soup on a Single GPU: An Efficient Subspace Training Strategy

Figure 2 for Scalable Learned Model Soup on a Single GPU: An Efficient Subspace Training Strategy

Figure 3 for Scalable Learned Model Soup on a Single GPU: An Efficient Subspace Training Strategy

Figure 4 for Scalable Learned Model Soup on a Single GPU: An Efficient Subspace Training Strategy

Abstract:Pre-training followed by fine-tuning is widely adopted among practitioners. The performance can be improved by "model soups"~\cite{wortsman2022model} via exploring various hyperparameter configurations.The Learned-Soup, a variant of model soups, significantly improves the performance but suffers from substantial memory and time costs due to the requirements of (i) having to load all fine-tuned models simultaneously, and (ii) a large computational graph encompassing all fine-tuned models. In this paper, we propose Memory Efficient Hyperplane Learned Soup (MEHL-Soup) to tackle this issue by formulating the learned soup as a hyperplane optimization problem and introducing block coordinate gradient descent to learn the mixing coefficients. At each iteration, MEHL-Soup only needs to load a few fine-tuned models and build a computational graph with one combined model. We further extend MEHL-Soup to MEHL-Soup+ in a layer-wise manner. Experimental results on various ViT models and data sets show that MEHL-Soup(+) outperforms Learned-Soup(+) in terms of test accuracy, and also reduces memory usage by more than $13\times$. Moreover, MEHL-Soup(+) can be run on a single GPU and achieves $9\times$ speed up in soup construction compared with the Learned-Soup. The code is released at https://github.com/nblt/MEHL-Soup.

* ECCV 2024

Via

Access Paper or Ask Questions

Learning Analysis of Kernel Ridgeless Regression with Asymmetric Kernel Learning

Jun 03, 2024

Fan He, Mingzhen He, Lei Shi, Xiaolin Huang, Johan A. K. Suykens

Figure 1 for Learning Analysis of Kernel Ridgeless Regression with Asymmetric Kernel Learning

Figure 2 for Learning Analysis of Kernel Ridgeless Regression with Asymmetric Kernel Learning

Figure 3 for Learning Analysis of Kernel Ridgeless Regression with Asymmetric Kernel Learning

Figure 4 for Learning Analysis of Kernel Ridgeless Regression with Asymmetric Kernel Learning

Abstract:Ridgeless regression has garnered attention among researchers, particularly in light of the ``Benign Overfitting'' phenomenon, where models interpolating noisy samples demonstrate robust generalization. However, kernel ridgeless regression does not always perform well due to the lack of flexibility. This paper enhances kernel ridgeless regression with Locally-Adaptive-Bandwidths (LAB) RBF kernels, incorporating kernel learning techniques to improve performance in both experiments and theory. For the first time, we demonstrate that functions learned from LAB RBF kernels belong to an integral space of Reproducible Kernel Hilbert Spaces (RKHSs). Despite the absence of explicit regularization in the proposed model, its optimization is equivalent to solving an $\ell_0$-regularized problem in the integral space of RKHSs, elucidating the origin of its generalization ability. Taking an approximation analysis viewpoint, we introduce an $l_q$-norm analysis technique (with $0<q<1$) to derive the learning rate for the proposed model under mild conditions. This result deepens our theoretical understanding, explaining that our algorithm's robust approximation ability arises from the large capacity of the integral space of RKHSs, while its generalization ability is ensured by sparsity, controlled by the number of support vectors. Experimental results on both synthetic and real datasets validate our theoretical conclusions.

* arXiv admin note: text overlap with arXiv:2310.05236

Via

Access Paper or Ask Questions

Pursuing Feature Separation based on Neural Collapse for Out-of-Distribution Detection

May 28, 2024

Yingwen Wu, Ruiji Yu, Xinwen Cheng, Zhengbao He, Xiaolin Huang

Figure 1 for Pursuing Feature Separation based on Neural Collapse for Out-of-Distribution Detection

Figure 2 for Pursuing Feature Separation based on Neural Collapse for Out-of-Distribution Detection

Figure 3 for Pursuing Feature Separation based on Neural Collapse for Out-of-Distribution Detection

Figure 4 for Pursuing Feature Separation based on Neural Collapse for Out-of-Distribution Detection

Abstract:In the open world, detecting out-of-distribution (OOD) data, whose labels are disjoint with those of in-distribution (ID) samples, is important for reliable deep neural networks (DNNs). To achieve better detection performance, one type of approach proposes to fine-tune the model with auxiliary OOD datasets to amplify the difference between ID and OOD data through a separation loss defined on model outputs. However, none of these studies consider enlarging the feature disparity, which should be more effective compared to outputs. The main difficulty lies in the diversity of OOD samples, which makes it hard to describe their feature distribution, let alone design losses to separate them from ID features. In this paper, we neatly fence off the problem based on an aggregation property of ID features named Neural Collapse (NC). NC means that the penultimate features of ID samples within a class are nearly identical to the last layer weight of the corresponding class. Based on this property, we propose a simple but effective loss called OrthLoss, which binds the features of OOD data in a subspace orthogonal to the principal subspace of ID features formed by NC. In this way, the features of ID and OOD samples are separated by different dimensions. By optimizing the feature separation loss rather than purely enlarging output differences, our detection achieves SOTA performance on CIFAR benchmarks without any additional data augmentation or sampling, demonstrating the importance of feature separation in OOD detection. The code will be published.

Via

Access Paper or Ask Questions

Towards Natural Machine Unlearning

May 24, 2024

Zhengbao He, Tao Li, Xinwen Cheng, Zhehao Huang, Xiaolin Huang

Figure 1 for Towards Natural Machine Unlearning

Figure 2 for Towards Natural Machine Unlearning

Figure 3 for Towards Natural Machine Unlearning

Figure 4 for Towards Natural Machine Unlearning

Abstract:Machine unlearning (MU) aims to eliminate information that has been learned from specific training data, namely forgetting data, from a pre-trained model. Currently, the mainstream of existing MU methods involves modifying the forgetting data with incorrect labels and subsequently fine-tuning the model. While learning such incorrect information can indeed remove knowledge, the process is quite unnatural as the unlearning process undesirably reinforces the incorrect information and leads to over-forgetting. Towards more \textit{natural} machine unlearning, we inject correct information from the remaining data to the forgetting samples when changing their labels. Through pairing these adjusted samples with their labels, the model will tend to use the injected correct information and naturally suppress the information meant to be forgotten. Albeit straightforward, such a first step towards natural machine unlearning can significantly outperform current state-of-the-art approaches. In particular, our method substantially reduces the over-forgetting and leads to strong robustness to hyperparameters, making it a promising candidate for practical machine unlearning.

Via

Access Paper or Ask Questions

Decentralized Kernel Ridge Regression Based on Data-dependent Random Feature

May 13, 2024

Ruikai Yang, Fan He, Mingzhen He, Jie Yang, Xiaolin Huang

Figure 1 for Decentralized Kernel Ridge Regression Based on Data-dependent Random Feature

Figure 2 for Decentralized Kernel Ridge Regression Based on Data-dependent Random Feature

Figure 3 for Decentralized Kernel Ridge Regression Based on Data-dependent Random Feature

Figure 4 for Decentralized Kernel Ridge Regression Based on Data-dependent Random Feature

Abstract:Random feature (RF) has been widely used for node consistency in decentralized kernel ridge regression (KRR). Currently, the consistency is guaranteed by imposing constraints on coefficients of features, necessitating that the random features on different nodes are identical. However, in many applications, data on different nodes varies significantly on the number or distribution, which calls for adaptive and data-dependent methods that generate different RFs. To tackle the essential difficulty, we propose a new decentralized KRR algorithm that pursues consensus on decision functions, which allows great flexibility and well adapts data on nodes. The convergence is rigorously given and the effectiveness is numerically verified: by capturing the characteristics of the data on each node, while maintaining the same communication costs as other methods, we achieved an average regression accuracy improvement of 25.5\% across six real-world data sets.

Via

Access Paper or Ask Questions

Data Imputation by Pursuing Better Classification: A Supervised Kernel-Based Method

May 13, 2024

Ruikai Yang, Fan He, Mingzhen He, Kaijie Wang, Xiaolin Huang

Figure 1 for Data Imputation by Pursuing Better Classification: A Supervised Kernel-Based Method

Figure 2 for Data Imputation by Pursuing Better Classification: A Supervised Kernel-Based Method

Figure 3 for Data Imputation by Pursuing Better Classification: A Supervised Kernel-Based Method

Figure 4 for Data Imputation by Pursuing Better Classification: A Supervised Kernel-Based Method

Abstract:Data imputation, the process of filling in missing feature elements for incomplete data sets, plays a crucial role in data-driven learning. A fundamental belief is that data imputation is helpful for learning performance, and it follows that the pursuit of better classification can guide the data imputation process. While some works consider using label information to assist in this task, their simplistic utilization of labels lacks flexibility and may rely on strict assumptions. In this paper, we propose a new framework that effectively leverages supervision information to complete missing data in a manner conducive to classification. Specifically, this framework operates in two stages. Firstly, it leverages labels to supervise the optimization of similarity relationships among data, represented by the kernel matrix, with the goal of enhancing classification accuracy. To mitigate overfitting that may occur during this process, a perturbation variable is introduced to improve the robustness of the framework. Secondly, the learned kernel matrix serves as additional supervision information to guide data imputation through regression, utilizing the block coordinate descent method. The superiority of the proposed method is evaluated on four real-world data sets by comparing it with state-of-the-art imputation methods. Remarkably, our algorithm significantly outperforms other methods when the data is missing more than 60\% of the features

Via

Access Paper or Ask Questions