Pseudo Labeling is a technique used to improve the performance of semi-supervised Graph Neural Networks (GNNs) by generating additional pseudo-labels based on confident predictions. However, the quality of generated pseudo-labels has long been a concern due to the sensitivity of the classification objective to given labels. To avoid the untrustworthy classification supervision indicating ``a node belongs to a specific class,'' we favor the fault-tolerant contrasting supervision demonstrating ``two nodes do not belong to the same class.'' Thus, the problem of generating high-quality pseudo-labels is then transformed into a relaxed version, i.e., finding reliable contrasting pairs. To achieve this, we propose a general framework for GNNs, termed Pseudo Contrastive Learning (PCL). It separates two nodes whose positive and negative pseudo-labels target the same class. To incorporate topological knowledge into learning, we devise a topologically weighted contrastive loss that spends more effort separating negative pairs with smaller topological distances. Additionally, to alleviate the heavy reliance on data augmentation, we augment nodes only by applying dropout to the encoded representations. Theoretically, we prove that PCL with the lightweight augmentation works like a representation regularizer to effectively learn separation between negative pairs. Experimentally, we employ PCL on various models, which consistently outperform their counterparts using other popular general techniques on five real-world graphs.
In autonomous driving, 3D object detection based on multi-modal data has become an indispensable approach when facing complex environments around the vehicle. During multi-modal detection, LiDAR and camera are simultaneously applied for capturing and modeling. However, due to the intrinsic discrepancies between the LiDAR point and camera image, the fusion of the data for object detection encounters a series of problems. Most multi-modal detection methods perform even worse than LiDAR-only methods. In this investigation, we propose a method named PTA-Det to improve the performance of multi-modal detection. Accompanied by PTA-Det, a Pseudo Point Cloud Generation Network is proposed, which can convert image information including texture and semantic features by pseudo points. Thereafter, through a transformer-based Point Fusion Transition (PFT) module, the features of LiDAR points and pseudo points from image can be deeply fused under a unified point-based representation. The combination of these modules can conquer the major obstacle in feature fusion across modalities and realizes a complementary and discriminative representation for proposal generation. Extensive experiments on the KITTI dataset show the PTA-Det achieves a competitive result and support its effectiveness.
OOD-CV challenge is an out-of-distribution generalization task. To solve this problem in object detection track, we propose a simple yet effective Generalize-then-Adapt (G&A) framework, which is composed of a two-stage domain generalization part and a one-stage domain adaptation part. The domain generalization part is implemented by a Supervised Model Pretraining stage using source data for model warm-up and a Weakly Semi-Supervised Model Pretraining stage using both source data with box-level label and auxiliary data (ImageNet-1K) with image-level label for performance boosting. The domain adaptation part is implemented as a Source-Free Domain Adaptation paradigm, which only uses the pre-trained model and the unlabeled target data to further optimize in a self-supervised training manner. The proposed G&A framework help us achieve the first place on the object detection leaderboard of the OOD-CV challenge. Code will be released in https://github.com/hikvision-research/OOD-CV.
Automatic segmentation of head and neck cancer (HNC) tumors and lymph nodes plays a crucial role in the optimization treatment strategy and prognosis analysis. This study aims to employ nnU-Net for automatic segmentation and radiomics for recurrence-free survival (RFS) prediction using pretreatment PET/CT images in multi-center HNC cohort. A multi-center HNC dataset with 883 patients (524 patients for training, 359 for testing) was provided in HECKTOR 2022. A bounding box of the extended oropharyngeal region was retrieved for each patient with fixed size of 224 x 224 x 224 $mm^{3}$. Then 3D nnU-Net architecture was adopted to automatic segmentation of primary tumor and lymph nodes synchronously.Based on predicted segmentation, ten conventional features and 346 standardized radiomics features were extracted for each patient. Three prognostic models were constructed containing conventional and radiomics features alone, and their combinations by multivariate CoxPH modelling. The statistical harmonization method, ComBat, was explored towards reducing multicenter variations. Dice score and C-index were used as evaluation metrics for segmentation and prognosis task, respectively. For segmentation task, we achieved mean dice score around 0.701 for primary tumor and lymph nodes by 3D nnU-Net. For prognostic task, conventional and radiomics models obtained the C-index of 0.658 and 0.645 in the test set, respectively, while the combined model did not improve the prognostic performance with the C-index of 0.648.
The diffusion of nonlinear loads and power electronic devices in power systems deteriorates the signal environment and increases the difficulty of measuring harmonic phasors. Considering accurate harmonic phasor information is necessary to deal with harmonic-related issues, this paper focuses on realizing accurate dynamic harmonic phasor estimation when the signal is contaminated by certain interharmonic tones, i.e. the out-of-band interference (OBI). Specifically, this work introduces the singular value decomposition into the Taylor-Fourier transform algorithm, deriving a general decomposition form for harmonic phasor filters, and yielding a set of adjustable parameters. Then these parameters are configured to minimize the negative influence of OBI on the dynamic harmonic phasor filters. Based on the recommended parameter values, the optimized harmonic estimator is obtained, and then tested under signals distorted by OBI, harmonic, noise, and in worse cases, frequency deviation, and some dynamic conditions. Test results show the proposal achieves good harmonic phasor estimation even under multiple interferences and dynamic scenarios, and has a much weaker dependence on OBI tones compared to conventional approaches.
Recent self-supervised pre-training methods on Heterogeneous Information Networks (HINs) have shown promising competitiveness over traditional semi-supervised Heterogeneous Graph Neural Networks (HGNNs). Unfortunately, their performance heavily depends on careful customization of various strategies for generating high-quality positive examples and negative examples, which notably limits their flexibility and generalization ability. In this work, we present SHGP, a novel Self-supervised Heterogeneous Graph Pre-training approach, which does not need to generate any positive examples or negative examples. It consists of two modules that share the same attention-aggregation scheme. In each iteration, the Att-LPA module produces pseudo-labels through structural clustering, which serve as the self-supervision signals to guide the Att-HGNN module to learn object embeddings and attention coefficients. The two modules can effectively utilize and enhance each other, promoting the model to learn discriminative embeddings. Extensive experiments on four real-world datasets demonstrate the superior effectiveness of SHGP against state-of-the-art unsupervised baselines and even semi-supervised baselines. We release our source code at: https://github.com/kepsail/SHGP.
Structured pruning techniques have achieved great compression performance on convolutional neural networks for image classification task. However, the majority of existing methods are weight-oriented, and their pruning results may be unsatisfactory when the original model is trained poorly. That is, a fully-trained model is required to provide useful weight information. This may be time-consuming, and the pruning results are sensitive to the updating process of model parameters. In this paper, we propose a metric named Average Filter Information Entropy (AFIE) to measure the importance of each filter. It is calculated by three major steps, i.e., low-rank decomposition of the "input-output" matrix of each convolutional layer, normalization of the obtained eigenvalues, and calculation of filter importance based on information entropy. By leveraging the proposed AFIE, the proposed framework is able to yield a stable importance evaluation of each filter no matter whether the original model is trained fully. We implement our AFIE based on AlexNet, VGG-16, and ResNet-50, and test them on MNIST, CIFAR-10, and ImageNet, respectively. The experimental results are encouraging. We surprisingly observe that for our methods, even when the original model is only trained with one epoch, the importance evaluation of each filter keeps identical to the results when the model is fully-trained. This indicates that the proposed pruning strategy can perform effectively at the beginning stage of the training process for the original model.
Pruning techniques are used comprehensively to compress convolutional neural networks (CNNs) on image classification. However, the majority of pruning methods require a well pre-trained model to provide useful supporting parameters, such as C1-norm, BatchNorm value and gradient information, which may lead to inconsistency of filter evaluation if the parameters of the pre-trained model are not well optimized. Therefore, we propose a sensitiveness based method to evaluate the importance of each layer from the perspective of inference accuracy by adding extra damage for the original model. Because the performance of the accuracy is determined by the distribution of parameters across all layers rather than individual parameter, the sensitiveness based method will be robust to update of parameters. Namely, we can obtain similar importance evaluation of each convolutional layer between the imperfect-trained and fully trained models. For VGG-16 on CIFAR-10, even when the original model is only trained with 50 epochs, we can get same evaluation of layer importance as the results when the model is trained fully. Then we will remove filters proportional from each layer by the quantified sensitiveness. Our sensitiveness based pruning framework is verified efficiently on VGG-16, a customized Conv-4 and ResNet-18 with CIFAR-10, MNIST and CIFAR-100, respectively.
Recently, several spatial-temporal memory-based methods have verified that storing intermediate frames and their masks as memory are helpful to segment target objects in videos. However, they mainly focus on better matching between the current frame and the memory frames without explicitly paying attention to the quality of the memory. Therefore, frames with poor segmentation masks are prone to be memorized, which leads to a segmentation mask error accumulation problem and further affect the segmentation performance. In addition, the linear increase of memory frames with the growth of frame number also limits the ability of the models to handle long videos. To this end, we propose a Quality-aware Dynamic Memory Network (QDMN) to evaluate the segmentation quality of each frame, allowing the memory bank to selectively store accurately segmented frames to prevent the error accumulation problem. Then, we combine the segmentation quality with temporal consistency to dynamically update the memory bank to improve the practicability of the models. Without any bells and whistles, our QDMN achieves new state-of-the-art performance on both DAVIS and YouTube-VOS benchmarks. Moreover, extensive experiments demonstrate that the proposed Quality Assessment Module (QAM) can be applied to memory-based methods as generic plugins and significantly improves performance. Our source code is available at https://github.com/workforai/QDMN.
Traffic flow forecasting is essential for traffic planning, control and management. The main challenge of traffic forecasting tasks is accurately capturing traffic networks' spatial and temporal correlation. Although there are many traffic forecasting methods, most of them still have limitations in capturing spatial and temporal correlations. To improve traffic forecasting accuracy, we propose a new Spatial-temporal forecasting model, namely the Residual Graph Convolutional Recurrent Network (RGCRN). The model uses our proposed Residual Graph Convolutional Network (ResGCN) to capture the fine-grained spatial correlation of the traffic road network and then uses a Bi-directional Gated Recurrent Unit (BiGRU) to model time series with spatial information and obtains the temporal correlation by analysing the change in information transfer between the forward and reverse neurons of the time series data. Our comparative experimental results on two real datasets show that RGCRN improves on average by 20.66% compared to the best baseline model. You can get our source code and data through https://github.com/zhangshqii/RGCRN.