Alert button
Picture for Huaxiong Li

Huaxiong Li

Alert button

Joint Projection Learning and Tensor Decomposition Based Incomplete Multi-view Clustering

Oct 06, 2023
Wei Lv, Chao Zhang, Huaxiong Li, Xiuyi Jia, Chunlin Chen

Figure 1 for Joint Projection Learning and Tensor Decomposition Based Incomplete Multi-view Clustering
Figure 2 for Joint Projection Learning and Tensor Decomposition Based Incomplete Multi-view Clustering
Figure 3 for Joint Projection Learning and Tensor Decomposition Based Incomplete Multi-view Clustering
Figure 4 for Joint Projection Learning and Tensor Decomposition Based Incomplete Multi-view Clustering

Incomplete multi-view clustering (IMVC) has received increasing attention since it is often that some views of samples are incomplete in reality. Most existing methods learn similarity subgraphs from original incomplete multi-view data and seek complete graphs by exploring the incomplete subgraphs of each view for spectral clustering. However, the graphs constructed on the original high-dimensional data may be suboptimal due to feature redundancy and noise. Besides, previous methods generally ignored the graph noise caused by the inter-class and intra-class structure variation during the transformation of incomplete graphs and complete graphs. To address these problems, we propose a novel Joint Projection Learning and Tensor Decomposition Based method (JPLTD) for IMVC. Specifically, to alleviate the influence of redundant features and noise in high-dimensional data, JPLTD introduces an orthogonal projection matrix to project the high-dimensional features into a lower-dimensional space for compact feature learning.Meanwhile, based on the lower-dimensional space, the similarity graphs corresponding to instances of different views are learned, and JPLTD stacks these graphs into a third-order low-rank tensor to explore the high-order correlations across different views. We further consider the graph noise of projected data caused by missing samples and use a tensor-decomposition based graph filter for robust clustering.JPLTD decomposes the original tensor into an intrinsic tensor and a sparse tensor. The intrinsic tensor models the true data similarities. An effective optimization algorithm is adopted to solve the JPLTD model. Comprehensive experiments on several benchmark datasets demonstrate that JPLTD outperforms the state-of-the-art methods. The code of JPLTD is available at https://github.com/weilvNJU/JPLTD.

* IEEE Transactions on Neural Networks and Learning Systems, 2023 
Viaarxiv icon

Contrastive Latent Space Reconstruction Learning for Audio-Text Retrieval

Sep 16, 2023
Kaiyi Luo, Xulong Zhang, Jianzong Wang, Huaxiong Li, Ning Cheng, Jing Xiao

Cross-modal retrieval (CMR) has been extensively applied in various domains, such as multimedia search engines and recommendation systems. Most existing CMR methods focus on image-to-text retrieval, whereas audio-to-text retrieval, a less explored domain, has posed a great challenge due to the difficulty to uncover discriminative features from audio clips and texts. Existing studies are restricted in the following two ways: 1) Most researchers utilize contrastive learning to construct a common subspace where similarities among data can be measured. However, they considers only cross-modal transformation, neglecting the intra-modal separability. Besides, the temperature parameter is not adaptively adjusted along with semantic guidance, which degrades the performance. 2) These methods do not take latent representation reconstruction into account, which is essential for semantic alignment. This paper introduces a novel audio-text oriented CMR approach, termed Contrastive Latent Space Reconstruction Learning (CLSR). CLSR improves contrastive representation learning by taking intra-modal separability into account and adopting an adaptive temperature control strategy. Moreover, the latent representation reconstruction modules are embedded into the CMR framework, which improves modal interaction. Experiments in comparison with some state-of-the-art methods on two audio-text datasets have validated the superiority of CLSR.

* Accepted by The 35th IEEE International Conference on Tools with Artificial Intelligence. (ICTAI 2023) 
Viaarxiv icon

Hierarchical Dynamic Image Harmonization

Nov 16, 2022
Haoxing Chen, Zhangxuan Gu, Yaohui Li, Jun Lan, Changhua Meng, Weiqiang Wang, Huaxiong Li

Figure 1 for Hierarchical Dynamic Image Harmonization
Figure 2 for Hierarchical Dynamic Image Harmonization
Figure 3 for Hierarchical Dynamic Image Harmonization
Figure 4 for Hierarchical Dynamic Image Harmonization

Image harmonization is a critical task in computer vision, which aims to adjust the fore-ground to make it compatible with the back-ground. Recent works mainly focus on using global transformation (i.e., normalization and color curve rendering) to achieve visual consistency. However, these model ignore local consistency and their model size limit their harmonization ability on edge devices. Inspired by the dynamic deep networks that adapt the model structures or parameters conditioned on the inputs, we propose a hierarchical dynamic network (HDNet) for efficient image harmonization to adapt the model parameters and features from local to global view for better feature transformation. Specifically, local dynamics (LD) and mask-aware global dynamics (MGD) are applied. LD enables features of different channels and positions to change adaptively and improve the representation ability of geometric transformation through structural information learning. MGD learns the representations of fore- and back-ground regions and correlations to global harmonization. Experiments show that the proposed HDNet reduces more than 80\% parameters compared with previous methods but still achieves the state-of-the-art performance on the popular iHarmony4 dataset. Our code is avaliable in https://github.com/chenhaoxing/HDNet.

Viaarxiv icon

Model-Aware Contrastive Learning: Towards Escaping Uniformity-Tolerance Dilemma in Training

Jul 16, 2022
Zizheng Huang, Chao Zhang, Huaxiong Li, Bo Wang, Chunlin Chen

Figure 1 for Model-Aware Contrastive Learning: Towards Escaping Uniformity-Tolerance Dilemma in Training
Figure 2 for Model-Aware Contrastive Learning: Towards Escaping Uniformity-Tolerance Dilemma in Training
Figure 3 for Model-Aware Contrastive Learning: Towards Escaping Uniformity-Tolerance Dilemma in Training
Figure 4 for Model-Aware Contrastive Learning: Towards Escaping Uniformity-Tolerance Dilemma in Training

Instance discrimination contrastive learning (CL) has achieved significant success in learning transferable representations. A hardness-aware property related to the temperature $ \tau $ of the CL loss is identified to play an essential role in automatically concentrating on hard negative samples. However, previous work also proves that there exists a uniformity-tolerance dilemma (UTD) in CL loss, which will lead to unexpected performance degradation. Specifically, a smaller temperature helps to learn separable embeddings but has less tolerance to semantically related samples, which may result in suboptimal embedding space, and vice versa. In this paper, we propose a Model-Aware Contrastive Learning (MACL) strategy to escape UTD. For the undertrained phases, there is less possibility that the high similarity region of the anchor contains latent positive samples. Thus, adopting a small temperature in these stages can impose larger penalty strength on hard negative samples to improve the discrimination of the CL model. In contrast, a larger temperature in the well-trained phases helps to explore semantic structures due to more tolerance to potential positive samples. During implementation, the temperature in MACL is designed to be adaptive to the alignment property that reflects the confidence of a CL model. Furthermore, we reexamine why contrastive learning requires a large number of negative samples in a unified gradient reduction perspective. Based on MACL and these analyses, a new CL loss is proposed in this work to improve the learned representations and training with small batch size.

Viaarxiv icon

Shaping Visual Representations with Attributes for Few-Shot Learning

Dec 13, 2021
Haoxing Chen, Huaxiong Li, Yaohui Li, Chunlin Chen

Figure 1 for Shaping Visual Representations with Attributes for Few-Shot Learning
Figure 2 for Shaping Visual Representations with Attributes for Few-Shot Learning
Figure 3 for Shaping Visual Representations with Attributes for Few-Shot Learning
Figure 4 for Shaping Visual Representations with Attributes for Few-Shot Learning

Few-shot recognition aims to recognize novel categories under low-data regimes. Due to the scarcity of images, machines cannot obtain enough effective information, and the generalization ability of the model is extremely weak. By using auxiliary semantic modalities, recent metric-learning based few-shot learning methods have achieved promising performances. However, these methods only augment the representations of support classes, while query images have no semantic modalities information to enhance representations. Instead, we propose attribute-shaped learning (ASL), which can normalize visual representations to predict attributes for query images. And we further devise an attribute-visual attention module (AVAM), which utilizes attributes to generate more discriminative features. Our method enables visual representations to focus on important regions with attributes guidance. Experiments demonstrate that our method can achieve competitive results on CUB and SUN benchmarks. Our code is available at {https://github.com/chenhaoxing/ASL}.

Viaarxiv icon

Sparse Spatial Transformers for Few-Shot Learning

Sep 27, 2021
Haoxing Chen, Huaxiong Li, Yaohui Li, Chunlin Chen

Figure 1 for Sparse Spatial Transformers for Few-Shot Learning
Figure 2 for Sparse Spatial Transformers for Few-Shot Learning
Figure 3 for Sparse Spatial Transformers for Few-Shot Learning
Figure 4 for Sparse Spatial Transformers for Few-Shot Learning

Learning from limited data is a challenging task since the scarcity of data leads to a poor generalization of the trained model. The classical global pooled representation is likely to lose useful local information. Recently, many few shot learning methods address this challenge by using deep descriptors and learning a pixel-level metric. However, using deep descriptors as feature representations may lose the contextual information of the image. And most of these methods deal with each class in the support set independently, which cannot sufficiently utilize discriminative information and task-specific embeddings. In this paper, we propose a novel Transformer based neural network architecture called Sparse Spatial Transformers (SSFormers), which can find task-relevant features and suppress task-irrelevant features. Specifically, we first divide each input image into several image patches of different sizes to obtain dense local features. These features retain contextual information while expressing local information. Then, a sparse spatial transformer layer is proposed to find spatial correspondence between the query image and the entire support set to select task-relevant image patches and suppress task-irrelevant image patches. Finally, we propose an image patch matching module to calculate the distance between dense local representations to determine which category the query image belongs to in the support set. Extensive experiments on popular few-shot learning benchmarks show that our method achieves the state-of-the-art performance. Our code is available at \url{https://github.com/chenhaoxing/SSFormers}.

Viaarxiv icon

Multi-level Metric Learning for Few-shot Image Recognition

Apr 12, 2021
Haoxing Chen, Huaxiong Li, Yaohui Li, Chunlin Chen

Figure 1 for Multi-level Metric Learning for Few-shot Image Recognition
Figure 2 for Multi-level Metric Learning for Few-shot Image Recognition
Figure 3 for Multi-level Metric Learning for Few-shot Image Recognition
Figure 4 for Multi-level Metric Learning for Few-shot Image Recognition

Few-shot learning is devoted to training a model on few samples. Recently, the method based on local descriptor metric-learning has achieved great performance. Most of these approaches learn a model based on a pixel-level metric. However, such works can only measure the relations between them on a single level, which is not comprehensive and effective. We argue that if query images can simultaneously be well classified via three distinct level similarity metrics, the query images within a class can be more tightly distributed in a smaller feature space, generating more discriminative feature maps. Motivated by this, we propose a novel Multi-level Metric Learning (MML) method for few-shot learning, which not only calculates the pixel-level similarity but also considers the similarity of part-level features and the similarity of distributions. First, we use a feature extractor to get the feature maps of images. Second, a multi-level metric module is proposed to calculate the part-level, pixel-level, and distribution-level similarities simultaneously. Specifically, the distribution-level similarity metric calculates the distribution distance (i.e., Wasserstein distance, Kullback-Leibler divergence) between query images and the support set, the pixel-level, and the part-level metric calculates the pixel-level and part-level similarities respectively. Finally, the fusion layer fuses three kinds of relation scores to obtain the final similarity score. Extensive experiments on popular benchmarks demonstrate that the MML method significantly outperforms the current state-of-the-art methods.

Viaarxiv icon

Hierarchical Representation based Query-Specific Prototypical Network for Few-Shot Image Classification

Mar 21, 2021
Yaohui Li, Huaxiong Li, Haoxing Chen, Chunlin Chen

Figure 1 for Hierarchical Representation based Query-Specific Prototypical Network for Few-Shot Image Classification
Figure 2 for Hierarchical Representation based Query-Specific Prototypical Network for Few-Shot Image Classification
Figure 3 for Hierarchical Representation based Query-Specific Prototypical Network for Few-Shot Image Classification
Figure 4 for Hierarchical Representation based Query-Specific Prototypical Network for Few-Shot Image Classification

Few-shot image classification aims at recognizing unseen categories with a small number of labeled training data. Recent metric-based frameworks tend to represent a support class by a fixed prototype (e.g., the mean of the support category) and make classification according to the similarities between query instances and support prototypes. However, discriminative dominant regions may locate uncertain areas of images and have various scales, which leads to the misaligned metric. Besides, a fixed prototype for one support category cannot fit for all query instances to accurately reflect their distances with this category, which lowers the efficiency of metric. Therefore, query-specific dominant regions in support samples should be extracted for a high-quality metric. To address these problems, we propose a Hierarchical Representation based Query-Specific Prototypical Network (QPN) to tackle the limitations by generating a region-level prototype for each query sample, which achieves both positional and dimensional semantic alignment simultaneously. Extensive experiments conducted on five benchmark datasets (including three fine-grained datasets) show that our proposed method outperforms the current state-of-the-art methods.

Viaarxiv icon

Multi-scale Adaptive Task Attention Network for Few-Shot Learning

Nov 30, 2020
Haoxing Chen, Huaxiong Li, Yaohui Li, Chunlin Chen

Figure 1 for Multi-scale Adaptive Task Attention Network for Few-Shot Learning
Figure 2 for Multi-scale Adaptive Task Attention Network for Few-Shot Learning
Figure 3 for Multi-scale Adaptive Task Attention Network for Few-Shot Learning
Figure 4 for Multi-scale Adaptive Task Attention Network for Few-Shot Learning

The goal of few-shot learning is to classify unseen categories with few labeled samples. Recently, the low-level information metric-learning based methods have achieved satisfying performance, since local representations (LRs) are more consistent between seen and unseen classes. However, most of these methods deal with each category in the support set independently, which is not sufficient to measure the relation between features, especially in a certain task. Moreover, the low-level information-based metric learning method suffers when dominant objects of different scales exist in a complex background. To address these issues, this paper proposes a novel Multi-scale Adaptive Task Attention Network (MATANet) for few-shot learning. Specifically, we first use a multi-scale feature generator to generate multiple features at different scales. Then, an adaptive task attention module is proposed to select the most important LRs among the entire task. Afterwards, a similarity-to-class module and a fusion layer are utilized to calculate a joint multi-scale similarity between the query image and the support set. Extensive experiments on popular benchmarks clearly show the effectiveness of the proposed MATANet compared with state-of-the-art methods.

Viaarxiv icon