Alert button
Picture for Weide Liu

Weide Liu

Alert button

LCReg: Long-Tailed Image Classification with Latent Categories based Recognition

Sep 13, 2023
Weide Liu, Zhonghua Wu, Yiming Wang, Henghui Ding, Fayao Liu, Jie Lin, Guosheng Lin

In this work, we tackle the challenging problem of long-tailed image recognition. Previous long-tailed recognition approaches mainly focus on data augmentation or re-balancing strategies for the tail classes to give them more attention during model training. However, these methods are limited by the small number of training images for the tail classes, which results in poor feature representations. To address this issue, we propose the Latent Categories based long-tail Recognition (LCReg) method. Our hypothesis is that common latent features shared by head and tail classes can be used to improve feature representation. Specifically, we learn a set of class-agnostic latent features shared by both head and tail classes, and then use semantic data augmentation on the latent features to implicitly increase the diversity of the training sample. We conduct extensive experiments on five long-tailed image recognition datasets, and the results show that our proposed method significantly improves the baselines.

* accepted by Pattern Recognition. arXiv admin note: substantial text overlap with arXiv:2206.01010 
Viaarxiv icon

ELFNet: Evidential Local-global Fusion for Stereo Matching

Aug 01, 2023
Jieming Lou, Weide Liu, Zhuo Chen, Fayao Liu, Jun Cheng

Figure 1 for ELFNet: Evidential Local-global Fusion for Stereo Matching
Figure 2 for ELFNet: Evidential Local-global Fusion for Stereo Matching
Figure 3 for ELFNet: Evidential Local-global Fusion for Stereo Matching
Figure 4 for ELFNet: Evidential Local-global Fusion for Stereo Matching

Although existing stereo matching models have achieved continuous improvement, they often face issues related to trustworthiness due to the absence of uncertainty estimation. Additionally, effectively leveraging multi-scale and multi-view knowledge of stereo pairs remains unexplored. In this paper, we introduce the \textbf{E}vidential \textbf{L}ocal-global \textbf{F}usion (ELF) framework for stereo matching, which endows both uncertainty estimation and confidence-aware fusion with trustworthy heads. Instead of predicting the disparity map alone, our model estimates an evidential-based disparity considering both aleatoric and epistemic uncertainties. With the normal inverse-Gamma distribution as a bridge, the proposed framework realizes intra evidential fusion of multi-level predictions and inter evidential fusion between cost-volume-based and transformer-based stereo matching. Extensive experimental results show that the proposed framework exploits multi-view information effectively and achieves state-of-the-art overall performance both on accuracy and cross-domain generalization. The codes are available at https://github.com/jimmy19991222/ELFNet.

* ICCV 2023 
Viaarxiv icon

Integrating Large Pre-trained Models into Multimodal Named Entity Recognition with Evidential Fusion

Jun 29, 2023
Weide Liu, Xiaoyang Zhong, Jingwen Hou, Shaohua Li, Haozhe Huang, Yuming Fang

Multimodal Named Entity Recognition (MNER) is a crucial task for information extraction from social media platforms such as Twitter. Most current methods rely on attention weights to extract information from both text and images but are often unreliable and lack interpretability. To address this problem, we propose incorporating uncertainty estimation into the MNER task, producing trustworthy predictions. Our proposed algorithm models the distribution of each modality as a Normal-inverse Gamma distribution, and fuses them into a unified distribution with an evidential fusion mechanism, enabling hierarchical characterization of uncertainties and promotion of prediction accuracy and trustworthiness. Additionally, we explore the potential of pre-trained large foundation models in MNER and propose an efficient fusion approach that leverages their robust feature representations. Experiments on two datasets demonstrate that our proposed method outperforms the baselines and achieves new state-of-the-art performance.

Viaarxiv icon

Harmonizing Base and Novel Classes: A Class-Contrastive Approach for Generalized Few-Shot Segmentation

Mar 24, 2023
Weide Liu, Zhonghua Wu, Yang Zhao, Yuming Fang, Chuan-Sheng Foo, Jun Cheng, Guosheng Lin

Figure 1 for Harmonizing Base and Novel Classes: A Class-Contrastive Approach for Generalized Few-Shot Segmentation
Figure 2 for Harmonizing Base and Novel Classes: A Class-Contrastive Approach for Generalized Few-Shot Segmentation
Figure 3 for Harmonizing Base and Novel Classes: A Class-Contrastive Approach for Generalized Few-Shot Segmentation
Figure 4 for Harmonizing Base and Novel Classes: A Class-Contrastive Approach for Generalized Few-Shot Segmentation

Current methods for few-shot segmentation (FSSeg) have mainly focused on improving the performance of novel classes while neglecting the performance of base classes. To overcome this limitation, the task of generalized few-shot semantic segmentation (GFSSeg) has been introduced, aiming to predict segmentation masks for both base and novel classes. However, the current prototype-based methods do not explicitly consider the relationship between base and novel classes when updating prototypes, leading to a limited performance in identifying true categories. To address this challenge, we propose a class contrastive loss and a class relationship loss to regulate prototype updates and encourage a large distance between prototypes from different classes, thus distinguishing the classes from each other while maintaining the performance of the base classes. Our proposed approach achieves new state-of-the-art performance for the generalized few-shot segmentation task on PASCAL VOC and MS COCO datasets.

Viaarxiv icon

CRCNet: Few-shot Segmentation with Cross-Reference and Region-Global Conditional Networks

Aug 23, 2022
Weide Liu, Chi Zhang, Guosheng Lin, Fayao Liu

Figure 1 for CRCNet: Few-shot Segmentation with Cross-Reference and Region-Global Conditional Networks
Figure 2 for CRCNet: Few-shot Segmentation with Cross-Reference and Region-Global Conditional Networks
Figure 3 for CRCNet: Few-shot Segmentation with Cross-Reference and Region-Global Conditional Networks
Figure 4 for CRCNet: Few-shot Segmentation with Cross-Reference and Region-Global Conditional Networks

Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images. In this paper, we propose a Cross-Reference and Local-Global Conditional Networks (CRCNet) for few-shot segmentation. Unlike previous works that only predict the query image's mask, our proposed model concurrently makes predictions for both the support image and the query image. Our network can better find the co-occurrent objects in the two images with a cross-reference mechanism, thus helping the few-shot segmentation task. To further improve feature comparison, we develop a local-global conditional module to capture both global and local relations. We also develop a mask refinement module to refine the prediction of the foreground regions recurrently. Experiments on the PASCAL VOC 2012, MS COCO, and FSS-1000 datasets show that our network achieves new state-of-the-art performance.

* arXiv admin note: substantial text overlap with arXiv:2003.10658 
Viaarxiv icon

Long-tailed Recognition by Learning from Latent Categories

Jun 02, 2022
Weide Liu, Zhonghua Wu, Yiming Wang, Henghui Ding, Fayao Liu, Jie Lin, Guosheng Lin

Figure 1 for Long-tailed Recognition by Learning from Latent Categories
Figure 2 for Long-tailed Recognition by Learning from Latent Categories
Figure 3 for Long-tailed Recognition by Learning from Latent Categories
Figure 4 for Long-tailed Recognition by Learning from Latent Categories

In this work, we address the challenging task of long-tailed image recognition. Previous long-tailed recognition methods commonly focus on the data augmentation or re-balancing strategy of the tail classes to give more attention to tail classes during the model training. However, due to the limited training images for tail classes, the diversity of tail class images is still restricted, which results in poor feature representations. In this work, we hypothesize that common latent features among the head and tail classes can be used to give better feature representation. Motivated by this, we introduce a Latent Categories based long-tail Recognition (LCReg) method. Specifically, we propose to learn a set of class-agnostic latent features shared among the head and tail classes. Then, we implicitly enrich the training sample diversity via applying semantic data augmentation to the latent features. Extensive experiments on five long-tailed image recognition datasets demonstrate that our proposed LCReg is able to significantly outperform previous methods and achieve state-of-the-art results.

Viaarxiv icon

Distilling Knowledge from Object Classification to Aesthetics Assessment

Jun 02, 2022
Jingwen Hou, Henghui Ding, Weisi Lin, Weide Liu, Yuming Fang

Figure 1 for Distilling Knowledge from Object Classification to Aesthetics Assessment
Figure 2 for Distilling Knowledge from Object Classification to Aesthetics Assessment
Figure 3 for Distilling Knowledge from Object Classification to Aesthetics Assessment
Figure 4 for Distilling Knowledge from Object Classification to Aesthetics Assessment

In this work, we point out that the major dilemma of image aesthetics assessment (IAA) comes from the abstract nature of aesthetic labels. That is, a vast variety of distinct contents can correspond to the same aesthetic label. On the one hand, during inference, the IAA model is required to relate various distinct contents to the same aesthetic label. On the other hand, when training, it would be hard for the IAA model to learn to distinguish different contents merely with the supervision from aesthetic labels, since aesthetic labels are not directly related to any specific content. To deal with this dilemma, we propose to distill knowledge on semantic patterns for a vast variety of image contents from multiple pre-trained object classification (POC) models to an IAA model. Expecting the combination of multiple POC models can provide sufficient knowledge on various image contents, the IAA model can easier learn to relate various distinct contents to a limited number of aesthetic labels. By supervising an end-to-end single-backbone IAA model with the distilled knowledge, the performance of the IAA model is significantly improved by 4.8% in SRCC compared to the version trained only with ground-truth aesthetic labels. On specific categories of images, the SRCC improvement brought by the proposed method can achieve up to 7.2%. Peer comparison also shows that our method outperforms 10 previous IAA methods.

Viaarxiv icon

Few-shot Segmentation with Optimal Transport Matching and Message Flow

Aug 19, 2021
Weide Liu, Chi Zhang, Henghui Ding, Tzu-Yi Hung, Guosheng Lin

Figure 1 for Few-shot Segmentation with Optimal Transport Matching and Message Flow
Figure 2 for Few-shot Segmentation with Optimal Transport Matching and Message Flow
Figure 3 for Few-shot Segmentation with Optimal Transport Matching and Message Flow
Figure 4 for Few-shot Segmentation with Optimal Transport Matching and Message Flow

We address the challenging task of few-shot segmentation in this work. It is essential for few-shot semantic segmentation to fully utilize the support information. Previous methods typically adapt masked average pooling over the support feature to extract the support clues as a global vector, usually dominated by the salient part and loses some important clues. In this work, we argue that every support pixel's information is desired to be transferred to all query pixels and propose a Correspondence Matching Network (CMNet) with an Optimal Transport Matching module to mine out the correspondence between the query and support images. Besides, it is important to fully utilize both local and global information from the annotated support images. To this end, we propose a Message Flow module to propagate the message along the inner-flow within the same image and cross-flow between support and query images, which greatly help enhance the local feature representations. We further address the few-shot segmentation as a multi-task learning problem to alleviate the domain gap issue between different datasets. Experiments on PASCAL VOC 2012, MS COCO, and FSS-1000 datasets show that our network achieves new state-of-the-art few-shot segmentation performance.

Viaarxiv icon

Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation

Aug 17, 2021
Weide Liu, Xiangfei Kong, Tzu-Yi Hung, Guosheng Lin

Figure 1 for Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation
Figure 2 for Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation
Figure 3 for Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation
Figure 4 for Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation

Weakly supervised image segmentation trained with image-level labels usually suffers from inaccurate coverage of object areas during the generation of the pseudo groundtruth. This is because the object activation maps are trained with the classification objective and lack the ability to generalize. To improve the generality of the objective activation maps, we propose a region prototypical network RPNet to explore the cross-image object diversity of the training set. Similar object parts across images are identified via region feature comparison. Object confidence is propagated between regions to discover new object areas while background regions are suppressed. Experiments show that the proposed method generates more complete and accurate pseudo object masks, while achieving state-of-the-art performance on PASCAL VOC 2012 and MS COCO. In addition, we investigate the robustness of the proposed method on reduced training sets.

Viaarxiv icon

Few-Shot Segmentation with Global and Local Contrastive Learning

Aug 11, 2021
Weide Liu, Zhonghua Wu, Henghui Ding, Fayao Liu, Jie Lin, Guosheng Lin

Figure 1 for Few-Shot Segmentation with Global and Local Contrastive Learning
Figure 2 for Few-Shot Segmentation with Global and Local Contrastive Learning
Figure 3 for Few-Shot Segmentation with Global and Local Contrastive Learning
Figure 4 for Few-Shot Segmentation with Global and Local Contrastive Learning

In this work, we address the challenging task of few-shot segmentation. Previous few-shot segmentation methods mainly employ the information of support images as guidance for query image segmentation. Although some works propose to build cross-reference between support and query images, their extraction of query information still depends on the support images. We here propose to extract the information from the query itself independently to benefit the few-shot segmentation task. To this end, we first propose a prior extractor to learn the query information from the unlabeled images with our proposed global-local contrastive learning. Then, we extract a set of predetermined priors via this prior extractor. With the obtained priors, we generate the prior region maps for query images, which locate the objects, as guidance to perform cross interaction with support features. In such a way, the extraction of query information is detached from the support branch, overcoming the limitation by support, and could obtain more informative query clues to achieve better interaction. Without bells and whistles, the proposed approach achieves new state-of-the-art performance for the few-shot segmentation task on PASCAL-5$^{i}$ and COCO datasets.

Viaarxiv icon