Multi-view clustering can partition data samples into their categories by learning a consensus representation in unsupervised way and has received more and more attention in recent years. However, most existing deep clustering methods learn consensus representation or view-specific representations from multiple views via view-wise aggregation way, where they ignore structure relationship of all samples. In this paper, we propose a novel multi-view clustering network to address these problems, called Global and Cross-view Feature Aggregation for Multi-View Clustering (GCFAggMVC). Specifically, the consensus data presentation from multiple views is obtained via cross-sample and cross-view feature aggregation, which fully explores the complementary ofsimilar samples. Moreover, we align the consensus representation and the view-specific representation by the structure-guided contrastive learning module, which makes the view-specific representations from different samples with high structure relationship similar. The proposed module is a flexible multi-view data representation module, which can be also embedded to the incomplete multi-view data clustering task via plugging our module into other frameworks. Extensive experiments show that the proposed method achieves excellent performance in both complete multi-view data clustering tasks and incomplete multi-view data clustering tasks.
Many dense 3D point clouds have been exploited to represent visual objects instead of traditional images or videos. To evaluate the perceptual quality of various point clouds, in this letter, we propose a novel and efficient Reduced-Reference quality metric for point clouds, which is based on Content-oriented sAliency Projection (RR-CAP). Specifically, we make the first attempt to simplify reference and distorted point clouds into projected saliency maps with a downsampling operation. Through this process, we tackle the issue of transmitting large-volume original point clouds to user-ends for quality assessment. Then, motivated by the characteristics of the human visual system (HVS), the objective quality scores of distorted point clouds are produced by combining content-oriented similarity and statistical correlation measurements. Finally, extensive experiments are conducted on SJTU-PCQA and WPC databases. The experimental results demonstrate that our proposed algorithm outperforms existing reduced-reference and no-reference quality metrics, and significantly reduces the performance gap between state-of-the-art full-reference quality assessment methods. In addition, we show the performance variation of each proposed technical component by ablation tests.
For the video salient object detection (VSOD) task, how to excavate the information from the appearance modality and the motion modality has always been a topic of great concern. The two-stream structure, including an RGB appearance stream and an optical flow motion stream, has been widely used as a typical pipeline for VSOD tasks, but the existing methods usually only use motion features to unidirectionally guide appearance features or adaptively but blindly fuse two modality features. However, these methods underperform in diverse scenarios due to the uncomprehensive and unspecific learning schemes. In this paper, following a more secure modeling philosophy, we deeply investigate the importance of appearance modality and motion modality in a more comprehensive way and propose a VSOD network with up and down parallel symmetry, named PSNet. Two parallel branches with different dominant modalities are set to achieve complete video saliency decoding with the cooperation of the Gather Diffusion Reinforcement (GDR) module and Cross-modality Refinement and Complement (CRC) module. Finally, we use the Importance Perception Fusion (IPF) module to fuse the features from two parallel branches according to their different importance in different scenarios. Experiments on four dataset benchmarks demonstrate that our method achieves desirable and competitive performance.