Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chang Tang

Mask-informed Deep Contrastive Incomplete Multi-view Clustering

Feb 04, 2025

Zhenglai Li, Yuqi Shi, Xiao He, Chang Tang

Abstract:Multi-view clustering (MvC) utilizes information from multiple views to uncover the underlying structures of data. Despite significant advancements in MvC, mitigating the impact of missing samples in specific views on the integration of knowledge from different views remains a critical challenge. This paper proposes a novel Mask-informed Deep Contrastive Incomplete Multi-view Clustering (Mask-IMvC) method, which elegantly identifies a view-common representation for clustering. Specifically, we introduce a mask-informed fusion network that aggregates incomplete multi-view information while considering the observation status of samples across various views as a mask, thereby reducing the adverse effects of missing values. Additionally, we design a prior knowledge-assisted contrastive learning loss that boosts the representation capability of the aggregated view-common representation by injecting neighborhood information of samples from different views. Finally, extensive experiments are conducted to demonstrate the superiority of the proposed Mask-IMvC method over state-of-the-art approaches across multiple MvC datasets, both in complete and incomplete scenarios.

Via

Access Paper or Ask Questions

Balanced Multi-view Clustering

Jan 05, 2025

Zhenglai Li, Jun Wang, Chang Tang, Xinzhong Zhu, Wei Zhang, Xinwang Liu

Figure 1 for Balanced Multi-view Clustering

Figure 2 for Balanced Multi-view Clustering

Figure 3 for Balanced Multi-view Clustering

Figure 4 for Balanced Multi-view Clustering

Abstract:Multi-view clustering (MvC) aims to integrate information from different views to enhance the capability of the model in capturing the underlying data structures. The widely used joint training paradigm in MvC is potentially not fully leverage the multi-view information, since the imbalanced and under-optimized view-specific features caused by the uniform learning objective for all views. For instance, particular views with more discriminative information could dominate the learning process in the joint training paradigm, leading to other views being under-optimized. To alleviate this issue, we first analyze the imbalanced phenomenon in the joint-training paradigm of multi-view clustering from the perspective of gradient descent for each view-specific feature extractor. Then, we propose a novel balanced multi-view clustering (BMvC) method, which introduces a view-specific contrastive regularization (VCR) to modulate the optimization of each view. Concretely, VCR preserves the sample similarities captured from the joint features and view-specific ones into the clustering distributions corresponding to view-specific features to enhance the learning process of view-specific feature extractors. Additionally, a theoretical analysis is provided to illustrate that VCR adaptively modulates the magnitudes of gradients for updating the parameters of view-specific feature extractors to achieve a balanced multi-view learning procedure. In such a manner, BMvC achieves a better trade-off between the exploitation of view-specific patterns and the exploration of view-invariance patterns to fully learn the multi-view information for the clustering task. Finally, a set of experiments are conducted to verify the superiority of the proposed method compared with state-of-the-art approaches both on eight benchmark MvC datasets and two spatially resolved transcriptomics datasets.

Via

Access Paper or Ask Questions

Trusted Mamba Contrastive Network for Multi-View Clustering

Dec 21, 2024

Jian Zhu, Xin Zou, Lei Liu, Zhangmin Huang, Ying Zhang, Chang Tang, Li-Rong Dai

Figure 1 for Trusted Mamba Contrastive Network for Multi-View Clustering

Figure 2 for Trusted Mamba Contrastive Network for Multi-View Clustering

Figure 3 for Trusted Mamba Contrastive Network for Multi-View Clustering

Figure 4 for Trusted Mamba Contrastive Network for Multi-View Clustering

Abstract:Multi-view clustering can partition data samples into their categories by learning a consensus representation in an unsupervised way and has received more and more attention in recent years. However, there is an untrusted fusion problem. The reasons for this problem are as follows: 1) The current methods ignore the presence of noise or redundant information in the view; 2) The similarity of contrastive learning comes from the same sample rather than the same cluster in deep multi-view clustering. It causes multi-view fusion in the wrong direction. This paper proposes a novel multi-view clustering network to address this problem, termed as Trusted Mamba Contrastive Network (TMCN). Specifically, we present a new Trusted Mamba Fusion Network (TMFN), which achieves a trusted fusion of multi-view data through a selective mechanism. Moreover, we align the fused representation and the view-specific representation using the Average-similarity Contrastive Learning (AsCL) module. AsCL increases the similarity of view presentation from the same cluster, not merely from the same sample. Extensive experiments show that the proposed method achieves state-of-the-art results in deep multi-view clustering tasks.

* accepted by 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing(ICASSP2025)

Via

Access Paper or Ask Questions

Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models

Oct 04, 2024

Xin Zou, Yizhou Wang, Yibo Yan, Sirui Huang, Kening Zheng, Junkai Chen, Chang Tang, Xuming Hu

Figure 1 for Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models

Figure 2 for Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models

Figure 3 for Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models

Figure 4 for Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models

Abstract:Despite their impressive capabilities, Multimodal Large Language Models (MLLMs) are susceptible to hallucinations, especially assertively fabricating content not present in the visual inputs. To address the aforementioned challenge, we follow a common cognitive process - when one's initial memory of critical on-sight details fades, it is intuitive to look at them a second time to seek a factual and accurate answer. Therefore, we introduce Memory-space Visual Retracing (MemVR), a novel hallucination mitigation paradigm that without the need for external knowledge retrieval or additional fine-tuning. In particular, we treat visual prompts as supplementary evidence to be reinjected into MLLMs via Feed Forward Network (FFN) as key-value memory, when the model is uncertain or even amnesic about question-relevant visual memories. Comprehensive experimental evaluations demonstrate that MemVR significantly mitigates hallucination issues across various MLLMs and excels in general benchmarks without incurring added time overhead, thus emphasizing its potential for widespread applicability.

Via

Access Paper or Ask Questions

Learning to Discover Generalized Facial Expressions

Sep 30, 2024

Tingzhang Luo, Yichao Liu, Yuanyuan Liu, Andi Zhang, Xin Wang, Chang Tang, Zhe Chen

Figure 1 for Learning to Discover Generalized Facial Expressions

Figure 2 for Learning to Discover Generalized Facial Expressions

Figure 3 for Learning to Discover Generalized Facial Expressions

Figure 4 for Learning to Discover Generalized Facial Expressions

Abstract:We introduce Facial Expression Category Discovery (FECD), a novel task in the domain of open-world facial expression recognition (O-FER). While Generalized Category Discovery (GCD) has been explored in natural image datasets, applying it to facial expressions presents unique challenges. Specifically, we identify two key biases to better understand these challenges: Theoretical Bias-arising from the introduction of new categories in unlabeled training data, and Practical Bias-stemming from the imbalanced and fine-grained nature of facial expression data. To address these challenges, we propose FER-GCD, an adversarial approach that integrates both implicit and explicit debiasing components. In the implicit debiasing process, we devise F-discrepancy, a novel metric used to estimate the upper bound of Theoretical Bias, helping the model minimize this upper bound through adversarial training. The explicit debiasing process further optimizes the feature generator and classifier to reduce Practical Bias. Extensive experiments on GCD-based FER datasets demonstrate that our FER-GCD framework significantly improves accuracy on both old and new categories, achieving an average improvement of 9.8% over the baseline and outperforming state-of-the-art methods.

Via

Access Paper or Ask Questions

Multi-level Graph Subspace Contrastive Learning for Hyperspectral Image Clustering

Apr 08, 2024

Jingxin Wang, Renxiang Guan, Kainan Gao, Zihao Li, Hao Li, Xianju Li, Chang Tang

Abstract:Hyperspectral image (HSI) clustering is a challenging task due to its high complexity. Despite subspace clustering shows impressive performance for HSI, traditional methods tend to ignore the global-local interaction in HSI data. In this study, we proposed a multi-level graph subspace contrastive learning (MLGSC) for HSI clustering. The model is divided into the following main parts. Graph convolution subspace construction: utilizing spectral and texture feautures to construct two graph convolution views. Local-global graph representation: local graph representations were obtained by step-by-step convolutions and a more representative global graph representation was obtained using an attention-based pooling strategy. Multi-level graph subspace contrastive learning: multi-level contrastive learning was conducted to obtain local-global joint graph representations, to improve the consistency of the positive samples between views, and to obtain more robust graph embeddings. Specifically, graph-level contrastive learning is used to better learn global representations of HSI data. Node-level intra-view and inter-view contrastive learning is designed to learn joint representations of local regions of HSI. The proposed model is evaluated on four popular HSI datasets: Indian Pines, Pavia University, Houston, and Xu Zhou. The overall accuracies are 97.75%, 99.96%, 92.28%, and 95.73%, which significantly outperforms the current state-of-the-art clustering methods.

* IJCNN 2024

Via

Access Paper or Ask Questions

Pixel-Superpixel Contrastive Learning and Pseudo-Label Correction for Hyperspectral Image Clustering

Dec 15, 2023

Renxiang Guan, Zihao Li, Xianju Li, Chang Tang

Figure 1 for Pixel-Superpixel Contrastive Learning and Pseudo-Label Correction for Hyperspectral Image Clustering

Figure 2 for Pixel-Superpixel Contrastive Learning and Pseudo-Label Correction for Hyperspectral Image Clustering

Figure 3 for Pixel-Superpixel Contrastive Learning and Pseudo-Label Correction for Hyperspectral Image Clustering

Figure 4 for Pixel-Superpixel Contrastive Learning and Pseudo-Label Correction for Hyperspectral Image Clustering

Abstract:Hyperspectral image (HSI) clustering is gaining considerable attention owing to recent methods that overcome the inefficiency and misleading results from the absence of supervised information. Contrastive learning methods excel at existing pixel level and super pixel level HSI clustering tasks. The pixel-level contrastive learning method can effectively improve the ability of the model to capture fine features of HSI but requires a large time overhead. The super pixel-level contrastive learning method utilizes the homogeneity of HSI and reduces computing resources; however, it yields rough classification results. To exploit the strengths of both methods, we present a pixel super pixel contrastive learning and pseudo-label correction (PSCPC) method for the HSI clustering. PSCPC can reasonably capture domain-specific and fine-grained features through super pixels and the comparative learning of a small number of pixels within the super pixels. To improve the clustering performance of super pixels, this paper proposes a pseudo-label correction module that aligns the clustering pseudo-labels of pixels and super-pixels. In addition, pixel-level clustering results are used to supervise super pixel-level clustering, improving the generalization ability of the model. Extensive experiments demonstrate the effectiveness and efficiency of PSCPC.

* Accepted at IEEE ICASSP 2024

Via

Access Paper or Ask Questions

Contrastive Multi-view Subspace Clustering of Hyperspectral Images based on Graph Convolutional Networks

Dec 11, 2023

Renxiang Guan, Zihao Li, Xianju Li, Chang Tang, Ruyi Feng

Figure 1 for Contrastive Multi-view Subspace Clustering of Hyperspectral Images based on Graph Convolutional Networks

Figure 2 for Contrastive Multi-view Subspace Clustering of Hyperspectral Images based on Graph Convolutional Networks

Figure 3 for Contrastive Multi-view Subspace Clustering of Hyperspectral Images based on Graph Convolutional Networks

Figure 4 for Contrastive Multi-view Subspace Clustering of Hyperspectral Images based on Graph Convolutional Networks

Abstract:High-dimensional and complex spectral structures make the clustering of hyperspectral images (HSI) a challenging task. Subspace clustering is an effective approach for addressing this problem. However, current subspace clustering algorithms are primarily designed for a single view and do not fully exploit the spatial or textural feature information in HSI. In this study, contrastive multi-view subspace clustering of HSI was proposed based on graph convolutional networks. Pixel neighbor textural and spatial-spectral information were sent to construct two graph convolutional subspaces to learn their affinity matrices. To maximize the interaction between different views, a contrastive learning algorithm was introduced to promote the consistency of positive samples and assist the model in extracting robust features. An attention-based fusion module was used to adaptively integrate these affinity matrices, constructing a more discriminative affinity matrix. The model was evaluated using four popular HSI datasets: Indian Pines, Pavia University, Houston, and Xu Zhou. It achieved overall accuracies of 97.61%, 96.69%, 87.21%, and 97.65%, respectively, and significantly outperformed state-of-the-art clustering methods. In conclusion, the proposed model effectively improves the clustering accuracy of HSI.

Via

Access Paper or Ask Questions

MS-Former: Memory-Supported Transformer for Weakly Supervised Change Detection with Patch-Level Annotations

Nov 16, 2023

Zhenglai Li, Chang Tang, Xinwang Liu, Changdong Li, Xianju Li, Wei Zhang

Figure 1 for MS-Former: Memory-Supported Transformer for Weakly Supervised Change Detection with Patch-Level Annotations

Figure 2 for MS-Former: Memory-Supported Transformer for Weakly Supervised Change Detection with Patch-Level Annotations

Figure 3 for MS-Former: Memory-Supported Transformer for Weakly Supervised Change Detection with Patch-Level Annotations

Figure 4 for MS-Former: Memory-Supported Transformer for Weakly Supervised Change Detection with Patch-Level Annotations

Abstract:Fully supervised change detection methods have achieved significant advancements in performance, yet they depend severely on acquiring costly pixel-level labels. Considering that the patch-level annotations also contain abundant information corresponding to both changed and unchanged objects in bi-temporal images, an intuitive solution is to segment the changes with patch-level annotations. How to capture the semantic variations associated with the changed and unchanged regions from the patch-level annotations to obtain promising change results is the critical challenge for the weakly supervised change detection task. In this paper, we propose a memory-supported transformer (MS-Former), a novel framework consisting of a bi-directional attention block (BAB) and a patch-level supervision scheme (PSS) tailored for weakly supervised change detection with patch-level annotations. More specifically, the BAM captures contexts associated with the changed and unchanged regions from the temporal difference features to construct informative prototypes stored in the memory bank. On the other hand, the BAM extracts useful information from the prototypes as supplementary contexts to enhance the temporal difference features, thereby better distinguishing changed and unchanged regions. After that, the PSS guides the network learning valuable knowledge from the patch-level annotations, thus further elevating the performance. Experimental results on three benchmark datasets demonstrate the effectiveness of our proposed method in the change detection task. The demo code for our work will be publicly available at \url{https://github.com/guanyuezhen/MS-Former}.

* 11 pages, 6 figures

Via

Access Paper or Ask Questions

Central Similarity Multi-View Hashing for Multimedia Retrieval

Aug 26, 2023

Jian Zhu, Wen Cheng, Yu Cui, Chang Tang, Yuyang Dai, Yong Li, Lingfang Zeng

Figure 1 for Central Similarity Multi-View Hashing for Multimedia Retrieval

Figure 2 for Central Similarity Multi-View Hashing for Multimedia Retrieval

Figure 3 for Central Similarity Multi-View Hashing for Multimedia Retrieval

Figure 4 for Central Similarity Multi-View Hashing for Multimedia Retrieval

Abstract:Hash representation learning of multi-view heterogeneous data is the key to improving the accuracy of multimedia retrieval. However, existing methods utilize local similarity and fall short of deeply fusing the multi-view features, resulting in poor retrieval accuracy. Current methods only use local similarity to train their model. These methods ignore global similarity. Furthermore, most recent works fuse the multi-view features via a weighted sum or concatenation. We contend that these fusion methods are insufficient for capturing the interaction between various views. We present a novel Central Similarity Multi-View Hashing (CSMVH) method to address the mentioned problems. Central similarity learning is used for solving the local similarity problem, which can utilize the global similarity between the hash center and samples. We present copious empirical data demonstrating the superiority of gate-based fusion over conventional approaches. On the MS COCO and NUS-WIDE, the proposed CSMVH performs better than the state-of-the-art methods by a large margin (up to 11.41% mean Average Precision (mAP) improvement).

* accepted by the Asia Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data (APWeb-WAIM2023)

Via

Access Paper or Ask Questions