Alert button
Picture for Xiping Hu

Xiping Hu

Alert button

Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models

Nov 14, 2023
Shiwen Ni, Dingwei Chen, Chengming Li, Xiping Hu, Ruifeng Xu, Min Yang

Recently Large Language Models (LLMs) have demonstrated their amazing text understanding and generation capabilities. However, even stronger LLMs may still learn incorrect knowledge from the training corpus, as well as some knowledge that is outdated over time. Direct secondary fine-tuning with data containing new knowledge may be ineffective in updating knowledge due to the conflict between old and new knowledge. In this paper, we propose a new paradigm for fine-tuning called F-Learning (Forgetting before Learning), which is based on parametric arithmetic to achieve forgetting of old knowledge and learning of new knowledge. Experimental results on two publicly available datasets demonstrate that our proposed F-Learning can obviously improve the knowledge updating performance of both full fine-tuning and LoRA fine-tuning. Moreover, we have also discovered that forgetting old knowledge by subtracting the parameters of LoRA can achieve a similar effect to subtracting the parameters of full fine-tuning, and sometimes even surpass it significantly.

* 8 pages, 2 figures, 2 tables 
Viaarxiv icon

Expression Syntax Information Bottleneck for Math Word Problems

Oct 24, 2023
Jing Xiong, Chengming Li, Min Yang, Xiping Hu, Bin Hu

Figure 1 for Expression Syntax Information Bottleneck for Math Word Problems
Figure 2 for Expression Syntax Information Bottleneck for Math Word Problems
Figure 3 for Expression Syntax Information Bottleneck for Math Word Problems
Figure 4 for Expression Syntax Information Bottleneck for Math Word Problems

Math Word Problems (MWP) aims to automatically solve mathematical questions given in texts. Previous studies tend to design complex models to capture additional information in the original text so as to enable the model to gain more comprehensive features. In this paper, we turn our attention in the opposite direction, and work on how to discard redundant features containing spurious correlations for MWP. To this end, we design an Expression Syntax Information Bottleneck method for MWP (called ESIB) based on variational information bottleneck, which extracts essential features of expression syntax tree while filtering latent-specific redundancy containing syntax-irrelevant features. The key idea of ESIB is to encourage multiple models to predict the same expression syntax tree for different problem representations of the same problem by mutual learning so as to capture consistent information of expression syntax tree and discard latent-specific redundancy. To improve the generalization ability of the model and generate more diverse expressions, we design a self-distillation loss to encourage the model to rely more on the expression syntax information in the latent space. Experimental results on two large-scale benchmarks show that our model not only achieves state-of-the-art results but also generates more diverse solutions. The code is available.

* This paper has been accepted by SIGIR 2022. The code can be found at https://github.com/menik1126/math_ESIB 
Viaarxiv icon

Accelerating Wireless Federated Learning via Nesterov's Momentum and Distributed Principle Component Analysis

Mar 31, 2023
Yanjie Dong, Luya Wang, Yuanfang Chi, Jia Wang, Haijun Zhang, Fei Richard Yu, Victor C. M. Leung, Xiping Hu

Figure 1 for Accelerating Wireless Federated Learning via Nesterov's Momentum and Distributed Principle Component Analysis
Figure 2 for Accelerating Wireless Federated Learning via Nesterov's Momentum and Distributed Principle Component Analysis
Figure 3 for Accelerating Wireless Federated Learning via Nesterov's Momentum and Distributed Principle Component Analysis
Figure 4 for Accelerating Wireless Federated Learning via Nesterov's Momentum and Distributed Principle Component Analysis

A wireless federated learning system is investigated by allowing a server and workers to exchange uncoded information via orthogonal wireless channels. Since the workers frequently upload local gradients to the server via bandwidth-limited channels, the uplink transmission from the workers to the server becomes a communication bottleneck. Therefore, a one-shot distributed principle component analysis (PCA) is leveraged to reduce the dimension of uploaded gradients such that the communication bottleneck is relieved. A PCA-based wireless federated learning (PCA-WFL) algorithm and its accelerated version (i.e., PCA-AWFL) are proposed based on the low-dimensional gradients and the Nesterov's momentum. For the non-convex loss functions, a finite-time analysis is performed to quantify the impacts of system hyper-parameters on the convergence of the PCA-WFL and PCA-AWFL algorithms. The PCA-AWFL algorithm is theoretically certified to converge faster than the PCA-WFL algorithm. Besides, the convergence rates of PCA-WFL and PCA-AWFL algorithms quantitatively reveal the linear speedup with respect to the number of workers over the vanilla gradient descent algorithm. Numerical results are used to demonstrate the improved convergence rates of the proposed PCA-WFL and PCA-AWFL algorithms over the benchmarks.

Viaarxiv icon

Self-consistent Reasoning For Solving Math Word Problems

Oct 27, 2022
Jing Xiong, Zhongwei Wan, Xiping Hu, Min Yang, Chengming Li

Figure 1 for Self-consistent Reasoning For Solving Math Word Problems
Figure 2 for Self-consistent Reasoning For Solving Math Word Problems
Figure 3 for Self-consistent Reasoning For Solving Math Word Problems
Figure 4 for Self-consistent Reasoning For Solving Math Word Problems

Math word problems (MWPs) is a task that automatically derives solution expression from a giving math problems in text. The previous studies suffer from spurious correlations between input text and output expression. To mitigate this issue, we propose a self-consistent reasoning framework called SCR, which attempts to adopt a pruning strategy to correct the output distribution shift so as to implicitly fix those spurious correlative samples. Specifically, we firstly obtain a sub-network by pruning a roberta2tree model, for the sake to use the gap on output distribution between the original roberta2tree model and the pruned sub-network to expose spurious correlative samples. Then, we calibrate the output distribution shift by applying symmetric Kullback-Leibler divergence to alleviate spurious correlations. In addition, SCR generates equivalent expressions, thereby, capturing the original text's logic rather than relying on hints from original text. Extensive experiments on two large-scale benchmarks demonstrate that our model substantially outperforms the strong baseline methods.

* Submitted to IEEE ICASSP 2023 
Viaarxiv icon

Data Augmentation for Depression Detection Using Skeleton-Based Gait Information

Jan 04, 2022
Jingjing Yang, Haifeng Lu, Chengming Li, Xiping Hu, Bin Hu

Figure 1 for Data Augmentation for Depression Detection Using Skeleton-Based Gait Information
Figure 2 for Data Augmentation for Depression Detection Using Skeleton-Based Gait Information
Figure 3 for Data Augmentation for Depression Detection Using Skeleton-Based Gait Information
Figure 4 for Data Augmentation for Depression Detection Using Skeleton-Based Gait Information

In recent years, the incidence of depression is rising rapidly worldwide, but large-scale depression screening is still challenging. Gait analysis provides a non-contact, low-cost, and efficient early screening method for depression. However, the early screening of depression based on gait analysis lacks sufficient effective sample data. In this paper, we propose a skeleton data augmentation method for assessing the risk of depression. First, we propose five techniques to augment skeleton data and apply them to depression and emotion datasets. Then, we divide augmentation methods into two types (non-noise augmentation and noise augmentation) based on the mutual information and the classification accuracy. Finally, we explore which augmentation strategies can capture the characteristics of human skeleton data more effectively. Experimental results show that the augmented training data set that retains more of the raw skeleton data properties determines the performance of the detection model. Specifically, rotation augmentation and channel mask augmentation make the depression detection accuracy reach 92.15% and 91.34%, respectively.

* 10 pages,10 figures 
Viaarxiv icon

SM-SGE: A Self-Supervised Multi-Scale Skeleton Graph Encoding Framework for Person Re-Identification

Jul 05, 2021
Haocong Rao, Xiping Hu, Jun Cheng, Bin Hu

Figure 1 for SM-SGE: A Self-Supervised Multi-Scale Skeleton Graph Encoding Framework for Person Re-Identification
Figure 2 for SM-SGE: A Self-Supervised Multi-Scale Skeleton Graph Encoding Framework for Person Re-Identification
Figure 3 for SM-SGE: A Self-Supervised Multi-Scale Skeleton Graph Encoding Framework for Person Re-Identification
Figure 4 for SM-SGE: A Self-Supervised Multi-Scale Skeleton Graph Encoding Framework for Person Re-Identification

Person re-identification via 3D skeletons is an emerging topic with great potential in security-critical applications. Existing methods typically learn body and motion features from the body-joint trajectory, whereas they lack a systematic way to model body structure and underlying relations of body components beyond the scale of body joints. In this paper, we for the first time propose a Self-supervised Multi-scale Skeleton Graph Encoding (SM-SGE) framework that comprehensively models human body, component relations, and skeleton dynamics from unlabeled skeleton graphs of various scales to learn an effective skeleton representation for person Re-ID. Specifically, we first devise multi-scale skeleton graphs with coarse-to-fine human body partitions, which enables us to model body structure and skeleton dynamics at multiple levels. Second, to mine inherent correlations between body components in skeletal motion, we propose a multi-scale graph relation network to learn structural relations between adjacent body-component nodes and collaborative relations among nodes of different scales, so as to capture more discriminative skeleton graph features. Last, we propose a novel multi-scale skeleton reconstruction mechanism to enable our framework to encode skeleton dynamics and high-level semantics from unlabeled skeleton graphs, which encourages learning a discriminative skeleton representation for person Re-ID. Extensive experiments show that SM-SGE outperforms most state-of-the-art skeleton-based methods. We further demonstrate its effectiveness on 3D skeleton data estimated from large-scale RGB videos. Our codes are open at https://github.com/Kali-Hac/SM-SGE.

* Accepted at ACMMM 2021 Main Track. Sole copyright holder is ACMMM. Codes are available at https://github.com/Kali-Hac/SM-SGE 
Viaarxiv icon

More than Encoder: Introducing Transformer Decoder to Upsample

Jun 20, 2021
Yijiang Li, Wentian Cai, Ying Gao, Xiping Hu

Figure 1 for More than Encoder: Introducing Transformer Decoder to Upsample
Figure 2 for More than Encoder: Introducing Transformer Decoder to Upsample
Figure 3 for More than Encoder: Introducing Transformer Decoder to Upsample
Figure 4 for More than Encoder: Introducing Transformer Decoder to Upsample

General segmentation models downsample images and then upsample to restore resolution for pixel level prediction. In such schema, upsample technique is vital in maintaining information for better performance. In this paper, we present a new upsample approach, Attention Upsample (AU), that could serve as general upsample method and be incorporated into any segmentation model that possesses lateral connections. AU leverages pixel-level attention to model long range dependency and global information for better reconstruction. It consists of Attention Decoder (AD) and bilinear upsample as residual connection to complement the upsampled features. AD adopts the idea of decoder from transformer which upsamples features conditioned on local and detailed information from contracting path. Moreover, considering the extensive memory and computation cost of pixel-level attention, we further propose to use window attention scheme to restrict attention computation in local windows instead of global range. Incorporating window attention, we denote our decoder as Window Attention Decoder (WAD) and our upsample method as Window Attention Upsample (WAU). We test our method on classic U-Net structure with lateral connection to deliver information from contracting path and achieve state-of-the-arts performance on Synapse (80.30 DSC and 23.12 HD) and MSD Brain (74.75 DSC) datasets.

* 19 pages, 7 figures 
Viaarxiv icon

Multi-Level Graph Encoding with Structural-Collaborative Relation Learning for Skeleton-Based Person Re-Identification

Jun 06, 2021
Haocong Rao, Shihao Xu, Xiping Hu, Jun Cheng, Bin Hu

Figure 1 for Multi-Level Graph Encoding with Structural-Collaborative Relation Learning for Skeleton-Based Person Re-Identification
Figure 2 for Multi-Level Graph Encoding with Structural-Collaborative Relation Learning for Skeleton-Based Person Re-Identification
Figure 3 for Multi-Level Graph Encoding with Structural-Collaborative Relation Learning for Skeleton-Based Person Re-Identification
Figure 4 for Multi-Level Graph Encoding with Structural-Collaborative Relation Learning for Skeleton-Based Person Re-Identification

Skeleton-based person re-identification (Re-ID) is an emerging open topic providing great value for safety-critical applications. Existing methods typically extract hand-crafted features or model skeleton dynamics from the trajectory of body joints, while they rarely explore valuable relation information contained in body structure or motion. To fully explore body relations, we construct graphs to model human skeletons from different levels, and for the first time propose a Multi-level Graph encoding approach with Structural-Collaborative Relation learning (MG-SCR) to encode discriminative graph features for person Re-ID. Specifically, considering that structurally-connected body components are highly correlated in a skeleton, we first propose a multi-head structural relation layer to learn different relations of neighbor body-component nodes in graphs, which helps aggregate key correlative features for effective node representations. Second, inspired by the fact that body-component collaboration in walking usually carries recognizable patterns, we propose a cross-level collaborative relation layer to infer collaboration between different level components, so as to capture more discriminative skeleton graph features. Finally, to enhance graph dynamics encoding, we propose a novel self-supervised sparse sequential prediction task for model pre-training, which facilitates encoding high-level graph semantics for person Re-ID. MG-SCR outperforms state-of-the-art skeleton-based methods, and it achieves superior performance to many multi-modal methods that utilize extra RGB or depth features. Our codes are available at https://github.com/Kali-Hac/MG-SCR.

* In IJCAI, 2021  
* Accepted at IJCAI 2021 Main Track. Sole copyright holder is IJCAI. Codes are available at https://github.com/Kali-Hac/MG-SCR 
Viaarxiv icon

Prototypical Contrast and Reverse Prediction: Unsupervised Skeleton Based Action Recognition

Nov 14, 2020
Shihao Xu, Haocong Rao, Xiping Hu, Bin Hu

Figure 1 for Prototypical Contrast and Reverse Prediction: Unsupervised Skeleton Based Action Recognition
Figure 2 for Prototypical Contrast and Reverse Prediction: Unsupervised Skeleton Based Action Recognition
Figure 3 for Prototypical Contrast and Reverse Prediction: Unsupervised Skeleton Based Action Recognition
Figure 4 for Prototypical Contrast and Reverse Prediction: Unsupervised Skeleton Based Action Recognition

In this paper, we focus on unsupervised representation learning for skeleton-based action recognition. Existing approaches usually learn action representations by sequential prediction but they suffer from the inability to fully learn semantic information. To address this limitation, we propose a novel framework named Prototypical Contrast and Reverse Prediction (PCRP), which not only creates reverse sequential prediction to learn low-level information (e.g., body posture at every frame) and high-level pattern (e.g., motion order), but also devises action prototypes to implicitly encode semantic similarity shared among sequences. In general, we regard action prototypes as latent variables and formulate PCRP as an expectation-maximization task. Specifically, PCRP iteratively runs (1) E-step as determining the distribution of prototypes by clustering action encoding from the encoder, and (2) M-step as optimizing the encoder by minimizing the proposed ProtoMAE loss, which helps simultaneously pull the action encoding closer to its assigned prototype and perform reverse prediction task. Extensive experiments on N-UCLA, NTU 60, and NTU 120 dataset present that PCRP outperforms state-of-the-art unsupervised methods and even achieves superior performance over some of supervised methods. Codes are available at https://github.com/Mikexu007/PCRP.

* Codes are available at https://github.com/Mikexu007/PCRP 
Viaarxiv icon

A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D Skeleton Based Person Re-Identification

Sep 05, 2020
Haocong Rao, Siqi Wang, Xiping Hu, Mingkui Tan, Yi Guo, Jun Cheng, Bin Hu, Xinwang Liu

Figure 1 for A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D Skeleton Based Person Re-Identification
Figure 2 for A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D Skeleton Based Person Re-Identification
Figure 3 for A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D Skeleton Based Person Re-Identification
Figure 4 for A Self-Supervised Gait Encoding Approach with Locality-Awareness for 3D Skeleton Based Person Re-Identification

Person re-identification (Re-ID) via gait features within 3D skeleton sequences is a newly-emerging topic with several advantages. Existing solutions either rely on hand-crafted descriptors or supervised gait representation learning. This paper proposes a self-supervised gait encoding approach that can leverage unlabeled skeleton data to learn gait representations for person Re-ID. Specifically, we first create self-supervision by learning to reconstruct unlabeled skeleton sequences reversely, which involves richer high-level semantics to obtain better gait representations. Other pretext tasks are also explored to further improve self-supervised learning. Second, inspired by the fact that motion's continuity endows adjacent skeletons in one skeleton sequence and temporally consecutive skeleton sequences with higher correlations (referred as locality in 3D skeleton data), we propose a locality-aware attention mechanism and a locality-aware contrastive learning scheme, which aim to preserve locality-awareness on intra-sequence level and inter-sequence level respectively during self-supervised learning. Last, with context vectors learned by our locality-aware attention mechanism and contrastive learning scheme, a novel feature named Constrastive Attention-based Gait Encodings (CAGEs) is designed to represent gait effectively. Empirical evaluations show that our approach significantly outperforms skeleton-based counterparts by 15-40% Rank-1 accuracy, and it even achieves superior performance to numerous multi-modal methods with extra RGB or depth information. Our codes are available at https://github.com/Kali-Hac/Locality-Awareness-SGE.

* Codes are available at https://github.com/Kali-Hac/Locality-Awareness-SGE. This article is an extended version of our conference (IJCAI-2020) paper at https://www.ijcai.org/proceedings/2020/0125. arXiv admin note: substantial text overlap with arXiv:2008.09435 
Viaarxiv icon