Abstract:Semi-supervised learning (SSL) has achieved significant progress by leveraging both labeled data and unlabeled data. Existing SSL methods overlook a common real-world scenario when labeled data is extremely scarce, potentially as limited as a single labeled sample in the dataset. General SSL approaches struggle to train effectively from scratch under such constraints, while methods utilizing pre-trained models often fail to find an optimal balance between leveraging limited labeled data and abundant unlabeled data. To address this challenge, we propose Firstly Adapt, Then catEgorize (FATE), a novel SSL framework tailored for scenarios with extremely limited labeled data. At its core, the two-stage prompt tuning paradigm FATE exploits unlabeled data to compensate for scarce supervision signals, then transfers to downstream tasks. Concretely, FATE first adapts a pre-trained model to the feature distribution of downstream data using volumes of unlabeled samples in an unsupervised manner. It then applies an SSL method specifically designed for pre-trained models to complete the final classification task. FATE is designed to be compatible with both vision and vision-language pre-trained models. Extensive experiments demonstrate that FATE effectively mitigates challenges arising from the scarcity of labeled samples in SSL, achieving an average performance improvement of 33.74% across seven benchmarks compared to state-of-the-art SSL methods. Code is available at https://anonymous.4open.science/r/Semi-supervised-learning-BA72.
Abstract:In real-world datasets, the challenges of long-tailed distributions and noisy labels often coexist, posing obstacles to the model training and performance. Existing studies on long-tailed noisy label learning (LTNLL) typically assume that the generation of noisy labels is independent of the long-tailed distribution, which may not be true from a practical perspective. In real-world situaiton, we observe that the tail class samples are more likely to be mislabeled as head, exacerbating the original degree of imbalance. We call this phenomenon as ``tail-to-head (T2H)'' noise. T2H noise severely degrades model performance by polluting the head classes and forcing the model to learn the tail samples as head. To address this challenge, we investigate the dynamic misleading process of the nosiy labels and propose a novel method called Disentangling and Unlearning for Long-tailed and Label-noisy data (DULL). It first employs the Inner-Feature Disentangling (IFD) to disentangle feature internally. Based on this, the Inner-Feature Partial Unlearning (IFPU) is then applied to weaken and unlearn incorrect feature regions correlated to wrong classes. This method prevents the model from being misled by noisy labels, enhancing the model's robustness against noise. To provide a controlled experimental environment, we further propose a new noise addition algorithm to simulate T2H noise. Extensive experiments on both simulated and real-world datasets demonstrate the effectiveness of our proposed method.
Abstract:Visual prompt tuning (VPT) provides an efficient and effective solution for adapting pre-trained models to various downstream tasks by incorporating learnable prompts. However, most prior art indiscriminately applies a fixed prompt distribution across different tasks, neglecting the importance of each block differing depending on the task. In this paper, we investigate adaptive distribution optimization (ADO) by addressing two key questions: (1) How to appropriately and formally define ADO, and (2) How to design an adaptive distribution strategy guided by this definition? Through in-depth analysis, we provide an affirmative answer that properly adjusting the distribution significantly improves VPT performance, and further uncover a key insight that a nested relationship exists between ADO and VPT. Based on these findings, we propose a new VPT framework, termed PRO-VPT (iterative Prompt RelOcation-based VPT), which adaptively adjusts the distribution building upon a nested optimization formulation. Specifically, we develop a prompt relocation strategy for ADO derived from this formulation, comprising two optimization steps: identifying and pruning idle prompts, followed by determining the optimal blocks for their relocation. By iteratively performing prompt relocation and VPT, our proposal adaptively learns the optimal prompt distribution, thereby unlocking the full potential of VPT. Extensive experiments demonstrate that our proposal significantly outperforms state-of-the-art VPT methods, e.g., PRO-VPT surpasses VPT by 1.6% average accuracy, leading prompt-based methods to state-of-the-art performance on the VTAB-1k benchmark. The code is available at https://github.com/ckshang/PRO-VPT.
Abstract:Federated Clustering (FC) is crucial to mining knowledge from unlabeled non-Independent Identically Distributed (non-IID) data provided by multiple clients while preserving their privacy. Most existing attempts learn cluster distributions at local clients, and then securely pass the desensitized information to the server for aggregation. However, some tricky but common FC problems are still relatively unexplored, including the heterogeneity in terms of clients' communication capacity and the unknown number of proper clusters $k^*$. To further bridge the gap between FC and real application scenarios, this paper first shows that the clients' communication asynchrony and unknown $k^*$ are complex coupling problems, and then proposes an Asynchronous Federated Cluster Learning (AFCL) method accordingly. It spreads the excessive number of seed points to the clients as a learning medium and coordinates them across the clients to form a consensus. To alleviate the distribution imbalance cumulated due to the unforeseen asynchronous uploading from the heterogeneous clients, we also design a balancing mechanism for seeds updating. As a result, the seeds gradually adapt to each other to reveal a proper number of clusters. Extensive experiments demonstrate the efficacy of AFCL.
Abstract:Clustering complex data in the form of attributed graphs has attracted increasing attention, where appropriate graph representation is a critical prerequisite for accurate cluster analysis. However, the Graph Convolutional Network will homogenize the representation of graph nodes due to the well-known over-smoothing effect. This limits the network architecture to a shallow one, losing the ability to capture the critical global distribution information for clustering. Therefore, we propose a generalized graph auto-encoder network, which introduces quaternion operations to the encoders to achieve efficient structured feature representation learning without incurring deeper network and larger-scale parameters. The generalization of our method lies in the following two aspects: 1) connecting the quaternion operation naturally suitable for four feature components with graph data of arbitrary attribute dimensions, and 2) introducing a generalized graph clustering objective as a loss term to obtain clustering-friendly representations without requiring a pre-specified number of clusters $k$. It turns out that the representations of nodes learned by the proposed Graph Clustering based on Generalized Quaternion representation learning (GCGQ) are more discriminative, containing global distribution information, and are more general, suiting downstream clustering under different $k$s. Extensive experiments including significance tests, ablation studies, and qualitative results, illustrate the superiority of GCGQ. The source code is temporarily opened at \url{https://anonymous.4open.science/r/ICLR-25-No7181-codes}.
Abstract:Long-tail learning has garnered widespread attention and achieved significant progress in recent times. However, even with pre-trained prior knowledge, models still exhibit weaker generalization performance on tail classes. The promising Sharpness-Aware Minimization (SAM) can effectively improve the generalization capability of models by seeking out flat minima in the loss landscape, which, however, comes at the cost of doubling the computational time. Since the update rule of SAM necessitates two consecutive (non-parallelizable) forward and backpropagation at each step. To address this issue, we propose a novel method called Random SAM prompt tuning (RSAM-PT) to improve the model generalization, requiring only one-step gradient computation at each step. Specifically, we search for the gradient descent direction within a random neighborhood of the parameters during each gradient update. To amplify the impact of tail-class samples and avoid overfitting, we employ the deferred re-weight scheme to increase the significance of tail-class samples. The classification accuracy of long-tailed data can be significantly improved by the proposed RSAM-PT, particularly for tail classes. RSAM-PT achieves the state-of-the-art performance of 90.3\%, 76.5\%, and 50.1\% on benchmark datasets CIFAR100-LT (IF 100), iNaturalist 2018, and Places-LT, respectively. The source code is temporarily available at https://github.com/Keke921/GNM-PT.
Abstract:Personalized Federated Learning (PFL) aims to acquire customized models for each client without disclosing raw data by leveraging the collective knowledge of distributed clients. However, the data collected in real-world scenarios is likely to follow a long-tailed distribution. For example, in the medical domain, it is more common for the number of general health notes to be much larger than those specifically relatedto certain diseases. The presence of long-tailed data can significantly degrade the performance of PFL models. Additionally, due to the diverse environments in which each client operates, data heterogeneity is also a classic challenge in federated learning. In this paper, we explore the joint problem of global long-tailed distribution and data heterogeneity in PFL and propose a method called Expert Collaborative Learning (ECL) to tackle this problem. Specifically, each client has multiple experts, and each expert has a different training subset, which ensures that each class, especially the minority classes, receives sufficient training. Multiple experts collaborate synergistically to produce the final prediction output. Without special bells and whistles, the vanilla ECL outperforms other state-of-the-art PFL methods on several benchmark datasets under different degrees of data heterogeneity and long-tailed distribution.
Abstract:Pre-trained large-scale models have exhibited remarkable efficacy in computer vision, particularly for 2D image analysis. However, when it comes to 3D point clouds, the constrained accessibility of data, in contrast to the vast repositories of images, poses a challenge for the development of 3D pre-trained models. This paper therefore attempts to directly leverage pre-trained models with 2D prior knowledge to accomplish the tasks for 3D point cloud analysis. Accordingly, we propose the Adaptive PointFormer (APF), which fine-tunes pre-trained 2D models with only a modest number of parameters to directly process point clouds, obviating the need for mapping to images. Specifically, we convert raw point clouds into point embeddings for aligning dimensions with image tokens. Given the inherent disorder in point clouds, in contrast to the structured nature of images, we then sequence the point embeddings to optimize the utilization of 2D attention priors. To calibrate attention across 3D and 2D domains and reduce computational overhead, a trainable PointFormer with a limited number of parameters is subsequently concatenated to a frozen pre-trained image model. Extensive experiments on various benchmarks demonstrate the effectiveness of the proposed APF. The source code and more details are available at https://vcc.tech/research/2024/PointFormer.
Abstract:The vision-language pre-training has enabled deep models to make a huge step forward in generalizing across unseen domains. The recent learning method based on the vision-language pre-training model is a great tool for domain generalization and can solve this problem to a large extent. However, there are still some issues that an advancement still suffers from trading-off between domain invariance and class separability, which are crucial in current DG problems. However, there are still some issues that an advancement still suffers from trading-off between domain invariance and class separability, which are crucial in current DG problems. In this paper, we introduce a novel prompt learning strategy that leverages deep vision prompts to address domain invariance while utilizing language prompts to ensure class separability, coupled with adaptive weighting mechanisms to balance domain invariance and class separability. Extensive experiments demonstrate that deep vision prompts effectively extract domain-invariant features, significantly improving the generalization ability of deep models and achieving state-of-the-art performance on three datasets.
Abstract:Existing continual learning literature relies heavily on a strong assumption that tasks arrive with a balanced data stream, which is often unrealistic in real-world applications. In this work, we explore task-imbalanced continual learning (TICL) scenarios where the distribution of task data is non-uniform across the whole learning process. We find that imbalanced tasks significantly challenge the capability of models to control the trade-off between stability and plasticity from the perspective of recent prompt-based continual learning methods. On top of the above finding, we propose Dynamically Anchored Prompting (DAP), a prompt-based method that only maintains a single general prompt to adapt to the shifts within a task stream dynamically. This general prompt is regularized in the prompt space with two specifically designed prompt anchors, called boosting anchor and stabilizing anchor, to balance stability and plasticity in TICL. Remarkably, DAP achieves this balance by only storing a prompt across the data stream, therefore offering a substantial advantage in rehearsal-free CL. Extensive experiments demonstrate that the proposed DAP results in 4.5% to 15% absolute improvements over state-of-the-art methods on benchmarks under task-imbalanced settings. Our code is available at https://github.com/chenxing6666/DAP