In recent years, benefiting from the expressive power of Graph Convolutional Networks (GCNs), significant breakthroughs have been made in face clustering. However, rare attention has been paid to GCN-based clustering on imbalanced data. Although imbalance problem has been extensively studied, the impact of imbalanced data on GCN-based linkage prediction task is quite different, which would cause problems in two aspects: imbalanced linkage labels and biased graph representations. The problem of imbalanced linkage labels is similar to that in image classification task, but the latter is a particular problem in GCN-based clustering via linkage prediction. Significantly biased graph representations in training can cause catastrophic overfitting of a GCN model. To tackle these problems, we evaluate the feasibility of those existing methods for imbalanced image classification problem on graphs with extensive experiments, and present a new method to alleviate the imbalanced labels and also augment graph representations using a Reverse-Imbalance Weighted Sampling (RIWS) strategy, followed with insightful analyses and discussions. The code and a series of imbalanced benchmark datasets synthesized from MS-Celeb-1M and DeepFashion are available on https://github.com/espectre/GCNs_on_imbalanced_datasets.
In recent years, benefiting from the expressivepower of Graph Convolutional Networks (GCNs),significant breakthroughs have been made in faceclustering. However, rare attention has been paidto GCN-based clustering on imbalanced data. Al-though imbalance problem has been extensivelystudied, the impact of imbalanced data on GCN-based linkage prediction task is quite different,which would cause problems in two aspects: im-balanced linkage labels and biased graph represen-tations. The problem of imbalanced linkage labelsis similar to that in image classification task, but thelatter is a particular problem in GCN-based clus-tering via linkage prediction. Significantly biasedgraph representations in training can cause catas-trophic overfitting of a GCN model. To tacklethese problems, we evaluate the feasibility of thoseexisting methods for imbalanced image classifica-tion problem on graphs with extensive experiments,and present a new method to alleviate the imbal-anced labels and also augment graph representa-tions using a Reverse-Imbalance Weighted Sam-pling (RIWS) strategy, followed with insightfulanalyses and discussions. A series of imbalancedbenchmark datasets synthesized from MS-Celeb-1M and DeepFashion will be openly available.