Alert button
Picture for Rui Xie

Rui Xie

Alert button

Towards Visual Taxonomy Expansion

Sep 12, 2023
Tinghui Zhu, Jingping Liu, Jiaqing Liang, Haiyun Jiang, Yanghua Xiao, Zongyu Wang, Rui Xie, Yunsen Xian

Figure 1 for Towards Visual Taxonomy Expansion
Figure 2 for Towards Visual Taxonomy Expansion
Figure 3 for Towards Visual Taxonomy Expansion
Figure 4 for Towards Visual Taxonomy Expansion

Taxonomy expansion task is essential in organizing the ever-increasing volume of new concepts into existing taxonomies. Most existing methods focus exclusively on using textual semantics, leading to an inability to generalize to unseen terms and the "Prototypical Hypernym Problem." In this paper, we propose Visual Taxonomy Expansion (VTE), introducing visual features into the taxonomy expansion task. We propose a textual hypernymy learning task and a visual prototype learning task to cluster textual and visual semantics. In addition to the tasks on respective modalities, we introduce a hyper-proto constraint that integrates textual and visual semantics to produce fine-grained visual semantics. Our method is evaluated on two datasets, where we obtain compelling results. Specifically, on the Chinese taxonomy dataset, our method significantly improves accuracy by 8.75 %. Additionally, our approach performs better than ChatGPT on the Chinese taxonomy dataset.

* ACMMM accepted paper 
Viaarxiv icon

Contrastive Label Disambiguation for Self-Supervised Terrain Traversability Learning in Off-Road Environments

Jul 06, 2023
Hanzhang Xue, Xiaochang Hu, Rui Xie, Hao Fu, Liang Xiao, Yiming Nie, Bin Dai

Figure 1 for Contrastive Label Disambiguation for Self-Supervised Terrain Traversability Learning in Off-Road Environments
Figure 2 for Contrastive Label Disambiguation for Self-Supervised Terrain Traversability Learning in Off-Road Environments
Figure 3 for Contrastive Label Disambiguation for Self-Supervised Terrain Traversability Learning in Off-Road Environments
Figure 4 for Contrastive Label Disambiguation for Self-Supervised Terrain Traversability Learning in Off-Road Environments

Discriminating the traversability of terrains is a crucial task for autonomous driving in off-road environments. However, it is challenging due to the diverse, ambiguous, and platform-specific nature of off-road traversability. In this paper, we propose a novel self-supervised terrain traversability learning framework, utilizing a contrastive label disambiguation mechanism. Firstly, weakly labeled training samples with pseudo labels are automatically generated by projecting actual driving experiences onto the terrain models constructed in real time. Subsequently, a prototype-based contrastive representation learning method is designed to learn distinguishable embeddings, facilitating the self-supervised updating of those pseudo labels. As the iterative interaction between representation learning and pseudo label updating, the ambiguities in those pseudo labels are gradually eliminated, enabling the learning of platform-specific and task-specific traversability without any human-provided annotations. Experimental results on the RELLIS-3D dataset and our Gobi Desert driving dataset demonstrate the effectiveness of the proposed method.

* 9 pages, 11 figures 
Viaarxiv icon

PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization

Jun 08, 2023
Yidong Wang, Zhuohao Yu, Zhengran Zeng, Linyi Yang, Cunxiang Wang, Hao Chen, Chaoya Jiang, Rui Xie, Jindong Wang, Xing Xie, Wei Ye, Shikun Zhang, Yue Zhang

Figure 1 for PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
Figure 2 for PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
Figure 3 for PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
Figure 4 for PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization

Instruction tuning large language models (LLMs) remains a challenging task, owing to the complexity of hyperparameter selection and the difficulty involved in evaluating the tuned models. To determine the optimal hyperparameters, an automatic, robust, and reliable evaluation benchmark is essential. However, establishing such a benchmark is not a trivial task due to the challenges associated with evaluation accuracy and privacy protection. In response to these challenges, we introduce a judge large language model, named PandaLM, which is trained to distinguish the superior model given several LLMs. PandaLM's focus extends beyond just the objective correctness of responses, which is the main focus of traditional evaluation datasets. It addresses vital subjective factors such as relative conciseness, clarity, adherence to instructions, comprehensiveness, and formality. To ensure the reliability of PandaLM, we collect a diverse human-annotated test dataset, where all contexts are generated by humans and labels are aligned with human preferences. Our results indicate that PandaLM-7B achieves 93.75% of GPT-3.5's evaluation ability and 88.28% of GPT-4's in terms of F1-score on our test dataset. PandaLM enables the evaluation of LLM to be fairer but with less cost, evidenced by significant improvements achieved by models tuned through PandaLM compared to their counterparts trained with default Alpaca's hyperparameters. In addition, PandaLM does not depend on API-based evaluations, thus avoiding potential data leakage. All resources of PandaLM are released at https://github.com/WeOpenML/PandaLM.

Viaarxiv icon

Causality-aware Concept Extraction based on Knowledge-guided Prompting

May 10, 2023
Siyu Yuan, Deqing Yang, Jinxi Liu, Shuyu Tian, Jiaqing Liang, Yanghua Xiao, Rui Xie

Figure 1 for Causality-aware Concept Extraction based on Knowledge-guided Prompting
Figure 2 for Causality-aware Concept Extraction based on Knowledge-guided Prompting
Figure 3 for Causality-aware Concept Extraction based on Knowledge-guided Prompting
Figure 4 for Causality-aware Concept Extraction based on Knowledge-guided Prompting

Concepts benefit natural language understanding but are far from complete in existing knowledge graphs (KGs). Recently, pre-trained language models (PLMs) have been widely used in text-based concept extraction (CE). However, PLMs tend to mine the co-occurrence associations from massive corpus as pre-trained knowledge rather than the real causal effect between tokens. As a result, the pre-trained knowledge confounds PLMs to extract biased concepts based on spurious co-occurrence correlations, inevitably resulting in low precision. In this paper, through the lens of a Structural Causal Model (SCM), we propose equipping the PLM-based extractor with a knowledge-guided prompt as an intervention to alleviate concept bias. The prompt adopts the topic of the given entity from the existing knowledge in KGs to mitigate the spurious co-occurrence correlations between entities and biased concepts. Our extensive experiments on representative multilingual KG datasets justify that our proposed prompt can effectively alleviate concept bias and improve the performance of PLM-based CE models.The code has been released on https://github.com/siyuyuan/KPCE.

* Accepted to ACL 2023 
Viaarxiv icon

Exploiting Pseudo Image Captions for Multimodal Summarization

May 09, 2023
Chaoya Jiang, Rui Xie, Wei Ye, Jinan Sun, Shikun Zhang

Figure 1 for Exploiting Pseudo Image Captions for Multimodal Summarization
Figure 2 for Exploiting Pseudo Image Captions for Multimodal Summarization
Figure 3 for Exploiting Pseudo Image Captions for Multimodal Summarization
Figure 4 for Exploiting Pseudo Image Captions for Multimodal Summarization

Cross-modal contrastive learning in vision language pretraining (VLP) faces the challenge of (partial) false negatives. In this paper, we study this problem from the perspective of Mutual Information (MI) optimization. It is common sense that InfoNCE loss used in contrastive learning will maximize the lower bound of MI between anchors and their positives, while we theoretically prove that MI involving negatives also matters when noises commonly exist. Guided by a more general lower bound form for optimization, we propose a contrastive learning strategy regulated by progressively refined cross-modal similarity, to more accurately optimize MI between an image/text anchor and its negative texts/images instead of improperly minimizing it. Our method performs competitively on four downstream cross-modal tasks and systematically balances the beneficial and harmful effects of (partial) false negative samples under theoretical guidance.

* Accepted at ACL2023 Findings 
Viaarxiv icon

Exploring Vision-Language Models for Imbalanced Learning

Apr 04, 2023
Yidong Wang, Zhuohao Yu, Jindong Wang, Qiang Heng, Hao Chen, Wei Ye, Rui Xie, Xing Xie, Shikun Zhang

Figure 1 for Exploring Vision-Language Models for Imbalanced Learning
Figure 2 for Exploring Vision-Language Models for Imbalanced Learning
Figure 3 for Exploring Vision-Language Models for Imbalanced Learning
Figure 4 for Exploring Vision-Language Models for Imbalanced Learning

Vision-Language models (VLMs) that use contrastive language-image pre-training have shown promising zero-shot classification performance. However, their performance on imbalanced dataset is relatively poor, where the distribution of classes in the training dataset is skewed, leading to poor performance in predicting minority classes. For instance, CLIP achieved only 5% accuracy on the iNaturalist18 dataset. We propose to add a lightweight decoder to VLMs to avoid OOM (out of memory) problem caused by large number of classes and capture nuanced features for tail classes. Then, we explore improvements of VLMs using prompt tuning, fine-tuning, and incorporating imbalanced algorithms such as Focal Loss, Balanced SoftMax and Distribution Alignment. Experiments demonstrate that the performance of VLMs can be further boosted when used with decoder and imbalanced methods. Specifically, our improved VLMs significantly outperforms zero-shot classification by an average accuracy of 6.58%, 69.82%, and 6.17%, on ImageNet-LT, iNaturalist18, and Places-LT, respectively. We further analyze the influence of pre-training data size, backbones, and training cost. Our study highlights the significance of imbalanced learning algorithms in face of VLMs pre-trained by huge data. We release our code at https://github.com/Imbalance-VLM/Imbalance-VLM.

* Technical report; 14 pages; code: https://github.com/Imbalance-VLM/Imbalance-VLM 
Viaarxiv icon

Optimal Sampling Designs for Multi-dimensional Streaming Time Series with Application to Power Grid Sensor Data

Mar 14, 2023
Rui Xie, Shuyang Bai, Ping Ma

Figure 1 for Optimal Sampling Designs for Multi-dimensional Streaming Time Series with Application to Power Grid Sensor Data
Figure 2 for Optimal Sampling Designs for Multi-dimensional Streaming Time Series with Application to Power Grid Sensor Data
Figure 3 for Optimal Sampling Designs for Multi-dimensional Streaming Time Series with Application to Power Grid Sensor Data
Figure 4 for Optimal Sampling Designs for Multi-dimensional Streaming Time Series with Application to Power Grid Sensor Data

The Internet of Things (IoT) system generates massive high-speed temporally correlated streaming data and is often connected with online inference tasks under computational or energy constraints. Online analysis of these streaming time series data often faces a trade-off between statistical efficiency and computational cost. One important approach to balance this trade-off is sampling, where only a small portion of the sample is selected for the model fitting and update. Motivated by the demands of dynamic relationship analysis of IoT system, we study the data-dependent sample selection and online inference problem for a multi-dimensional streaming time series, aiming to provide low-cost real-time analysis of high-speed power grid electricity consumption data. Inspired by D-optimality criterion in design of experiments, we propose a class of online data reduction methods that achieve an optimal sampling criterion and improve the computational efficiency of the online analysis. We show that the optimal solution amounts to a strategy that is a mixture of Bernoulli sampling and leverage score sampling. The leverage score sampling involves auxiliary estimations that have a computational advantage over recursive least squares updates. Theoretical properties of the auxiliary estimations involved are also discussed. When applied to European power grid consumption data, the proposed leverage score based sampling methods outperform the benchmark sampling method in online estimation and prediction. The general applicability of the sampling-assisted online estimation method is assessed via simulation studies.

* Accepted by The Annals of Applied Statistics 
Viaarxiv icon

Focus Is What You Need For Chinese Grammatical Error Correction

Oct 27, 2022
Jingheng Ye, Yinghui Li, Shirong Ma, Rui Xie, Wei Wu, Hai-Tao Zheng

Figure 1 for Focus Is What You Need For Chinese Grammatical Error Correction
Figure 2 for Focus Is What You Need For Chinese Grammatical Error Correction
Figure 3 for Focus Is What You Need For Chinese Grammatical Error Correction
Figure 4 for Focus Is What You Need For Chinese Grammatical Error Correction

Chinese Grammatical Error Correction (CGEC) aims to automatically detect and correct grammatical errors contained in Chinese text. In the long term, researchers regard CGEC as a task with a certain degree of uncertainty, that is, an ungrammatical sentence may often have multiple references. However, we argue that even though this is a very reasonable hypothesis, it is too harsh for the intelligence of the mainstream models in this era. In this paper, we first discover that multiple references do not actually bring positive gains to model training. On the contrary, it is beneficial to the CGEC model if the model can pay attention to small but essential data during the training process. Furthermore, we propose a simple yet effective training strategy called OneTarget to improve the focus ability of the CGEC models and thus improve the CGEC performance. Extensive experiments and detailed analyses demonstrate the correctness of our discovery and the effectiveness of our proposed method.

* Submitted to ICASSP2023 (currently under review) 
Viaarxiv icon

A Curriculum Learning Approach for Multi-domain Text Classification Using Keyword weight Ranking

Oct 27, 2022
Zilin Yuan, Yinghui Li, Yangning Li, Rui Xie, Wei Wu, Hai-Tao Zheng

Figure 1 for A Curriculum Learning Approach for Multi-domain Text Classification Using Keyword weight Ranking
Figure 2 for A Curriculum Learning Approach for Multi-domain Text Classification Using Keyword weight Ranking
Figure 3 for A Curriculum Learning Approach for Multi-domain Text Classification Using Keyword weight Ranking
Figure 4 for A Curriculum Learning Approach for Multi-domain Text Classification Using Keyword weight Ranking

Text classification is a very classic NLP task, but it has two prominent shortcomings: On the one hand, text classification is deeply domain-dependent. That is, a classifier trained on the corpus of one domain may not perform so well in another domain. On the other hand, text classification models require a lot of annotated data for training. However, for some domains, there may not exist enough annotated data. Therefore, it is valuable to investigate how to efficiently utilize text data from different domains to improve the performance of models in various domains. Some multi-domain text classification models are trained by adversarial training to extract shared features among all domains and the specific features of each domain. We noted that the distinctness of the domain-specific features is different, so in this paper, we propose to use a curriculum learning strategy based on keyword weight ranking to improve the performance of multi-domain text classification models. The experimental results on the Amazon review and FDU-MTL datasets show that our curriculum learning strategy effectively improves the performance of multi-domain text classification models based on adversarial learning and outperforms state-of-the-art methods.

* Submitted to ICASSP2023 (currently under review) 
Viaarxiv icon