Constructing a 3D scene capable of accommodating open-ended language queries, is a pivotal pursuit, particularly within the domain of robotics. Such technology facilitates robots in executing object manipulations based on human language directives. To tackle this challenge, some research efforts have been dedicated to the development of language-embedded implicit fields. However, implicit fields (e.g. NeRF) encounter limitations due to the necessity of processing a large number of input views for reconstruction, coupled with their inherent inefficiencies in inference. Thus, we present the GaussianGrasper, which utilizes 3D Gaussian Splatting to explicitly represent the scene as a collection of Gaussian primitives. Our approach takes a limited set of RGB-D views and employs a tile-based splatting technique to create a feature field. In particular, we propose an Efficient Feature Distillation (EFD) module that employs contrastive learning to efficiently and accurately distill language embeddings derived from foundational models. With the reconstructed geometry of the Gaussian field, our method enables the pre-trained grasping model to generate collision-free grasp pose candidates. Furthermore, we propose a normal-guided grasp module to select the best grasp pose. Through comprehensive real-world experiments, we demonstrate that GaussianGrasper enables robots to accurately query and grasp objects with language instructions, providing a new solution for language-guided manipulation tasks. Data and codes can be available at https://github.com/MrSecant/GaussianGrasper.
Fine-grained ship classification in remote sensing (RS-FGSC) poses a significant challenge due to the high similarity between classes and the limited availability of labeled data, limiting the effectiveness of traditional supervised classification methods. Recent advancements in large pre-trained Vision-Language Models (VLMs) have demonstrated impressive capabilities in few-shot or zero-shot learning, particularly in understanding image content. This study delves into harnessing the potential of VLMs to enhance classification accuracy for unseen ship categories, which holds considerable significance in scenarios with restricted data due to cost or privacy constraints. Directly fine-tuning VLMs for RS-FGSC often encounters the challenge of overfitting the seen classes, resulting in suboptimal generalization to unseen classes, which highlights the difficulty in differentiating complex backgrounds and capturing distinct ship features. To address these issues, we introduce a novel prompt tuning technique that employs a hierarchical, multi-granularity prompt design. Our approach integrates remote sensing ship priors through bias terms, learned from a small trainable network. This strategy enhances the model's generalization capabilities while improving its ability to discern intricate backgrounds and learn discriminative ship features. Furthermore, we contribute to the field by introducing a comprehensive dataset, FGSCM-52, significantly expanding existing datasets with more extensive data and detailed annotations for less common ship classes. Extensive experimental evaluations demonstrate the superiority of our proposed method over current state-of-the-art techniques. The source code will be made publicly available.
The lack of object-level annotations poses a significant challenge for object detection in remote sensing images (RSIs). To address this issue, active learning (AL) and semi-supervised learning (SSL) techniques have been proposed to enhance the quality and quantity of annotations. AL focuses on selecting the most informative samples for annotation, while SSL leverages the knowledge from unlabeled samples. In this letter, we propose a novel AL method to boost semi-supervised object detection (SSOD) for remote sensing images with a teacher student network, called SSOD-AT. The proposed method incorporates an RoI comparison module (RoICM) to generate high-confidence pseudo-labels for regions of interest (RoIs). Meanwhile, the RoICM is utilized to identify the top-K uncertain images. To reduce redundancy in the top-K uncertain images for human labeling, a diversity criterion is introduced based on object-level prototypes of different categories using both labeled and pseudo-labeled images. Extensive experiments on DOTA and DIOR, two popular datasets, demonstrate that our proposed method outperforms state-of-the-art methods for object detection in RSIs. Compared with the best performance in the SOTA methods, the proposed method achieves 1 percent improvement in most cases in the whole AL.
Pseudo-labelling is a popular technique in unsuper-vised domain adaptation for semantic segmentation. However, pseudo labels are noisy and inevitably have confirmation bias due to the discrepancy between source and target domains and training process. In this paper, we train the model by the pseudo labels which are implicitly produced by itself to learn new complementary knowledge about target domain. Specifically, we propose a tri-learning architecture, where every two branches produce the pseudo labels to train the third one. And we align the pseudo labels based on the similarity of the probability distributions for each two branches. To further implicitly utilize the pseudo labels, we maximize the distances of features for different classes and minimize the distances for the same classes by triplet loss. Extensive experiments on GTA5 to Cityscapes and SYNTHIA to Cityscapes tasks show that the proposed method has considerable improvements.
Multi-label learning draws great interests in many real world applications. It is a highly costly task to assign many labels by the oracle for one instance. Meanwhile, it is also hard to build a good model without diagnosing discriminative labels. Can we reduce the label costs and improve the ability to train a good model for multi-label learning simultaneously? Active learning addresses the less training samples problem by querying the most valuable samples to achieve a better performance with little costs. In multi-label active learning, some researches have been done for querying the relevant labels with less training samples or querying all labels without diagnosing the discriminative information. They all cannot effectively handle the outlier labels for the measurement of uncertainty. Since Maximum Correntropy Criterion (MCC) provides a robust analysis for outliers in many machine learning and data mining algorithms, in this paper, we derive a robust multi-label active learning algorithm based on MCC by merging uncertainty and representativeness, and propose an efficient alternating optimization method to solve it. With MCC, our method can eliminate the influence of outlier labels that are not discriminative to measure the uncertainty. To make further improvement on the ability of information measurement, we merge uncertainty and representativeness with the prediction labels of unknown data. It can not only enhance the uncertainty but also improve the similarity measurement of multi-label data with labels information. Experiments on benchmark multi-label data sets have shown a superior performance than the state-of-the-art methods.
How can we find a general way to choose the most suitable samples for training a classifier? Even with very limited prior information? Active learning, which can be regarded as an iterative optimization procedure, plays a key role to construct a refined training set to improve the classification performance in a variety of applications, such as text analysis, image recognition, social network modeling, etc. Although combining representativeness and informativeness of samples has been proven promising for active sampling, state-of-the-art methods perform well under certain data structures. Then can we find a way to fuse the two active sampling criteria without any assumption on data? This paper proposes a general active learning framework that effectively fuses the two criteria. Inspired by a two-sample discrepancy problem, triple measures are elaborately designed to guarantee that the query samples not only possess the representativeness of the unlabeled data but also reveal the diversity of the labeled data. Any appropriate similarity measure can be employed to construct the triple measures. Meanwhile, an uncertain measure is leveraged to generate the informativeness criterion, which can be carried out in different ways. Rooted in this framework, a practical active learning algorithm is proposed, which exploits a radial basis function together with the estimated probabilities to construct the triple measures and a modified Best-versus-Second-Best strategy to construct the uncertain measure, respectively. Experimental results on benchmark datasets demonstrate that our algorithm consistently achieves superior performance over the state-of-the-art active learning algorithms.
Labeling each instance in a large dataset is extremely labor- and time- consuming . One way to alleviate this problem is active learning, which aims to which discover the most valuable instances for labeling to construct a powerful classifier. Considering both informativeness and representativeness provides a promising way to design a practical active learning. However, most existing active learning methods select instances favoring either informativeness or representativeness. Meanwhile, many are designed based on the binary class, so that they may present suboptimal solutions on the datasets with multiple classes. In this paper, a hybrid informative and representative criterion based multi-class active learning approach is proposed. We combine the informative informativeness and representativeness into one formula, which can be solved under a unified framework. The informativeness is measured by the margin minimum while the representative information is measured by the maximum mean discrepancy. By minimizing the upper bound for the true risk, we generalize the empirical risk minimization principle to the active learning setting. Simultaneously, our proposed method makes full use of the label information, and the proposed active learning is designed based on multiple classes. So the proposed method is not suitable to the binary class but also the multiple classes. We conduct our experiments on twelve benchmark UCI data sets, and the experimental results demonstrate that the proposed method performs better than some state-of-the-art methods.