Alert button
Picture for Zijian Li

Zijian Li

Alert button

Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese

Sep 08, 2023
Haochun Wang, Sendong Zhao, Zewen Qiang, Zijian Li, Nuwa Xi, Yanrui Du, MuZhen Cai, Haoqiang Guo, Yuhan Chen, Haoming Xu, Bing Qin, Ting Liu

Figure 1 for Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese
Figure 2 for Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese
Figure 3 for Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese
Figure 4 for Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese

Large Language Models (LLMs) have demonstrated remarkable success in diverse natural language processing (NLP) tasks in general domains. However, LLMs sometimes generate responses with the hallucination about medical facts due to limited domain knowledge. Such shortcomings pose potential risks in the utilization of LLMs within medical contexts. To address this challenge, we propose knowledge-tuning, which leverages structured medical knowledge bases for the LLMs to grasp domain knowledge efficiently and facilitate reliable response generation. We also release cMedKnowQA, a Chinese medical knowledge question-answering dataset constructed from medical knowledge bases to assess the medical knowledge proficiency of LLMs. Experimental results show that the LLMs which are knowledge-tuned with cMedKnowQA, can exhibit higher levels of accuracy in response generation compared with vanilla instruction-tuning and offer a new reliable way for the domain adaptation of LLMs.

* 11 pages, 5 figures 
Viaarxiv icon

FedCiR: Client-Invariant Representation Learning for Federated Non-IID Features

Aug 30, 2023
Zijian Li, Zehong Lin, Jiawei Shao, Yuyi Mao, Jun Zhang

Federated learning (FL) is a distributed learning paradigm that maximizes the potential of data-driven models for edge devices without sharing their raw data. However, devices often have non-independent and identically distributed (non-IID) data, meaning their local data distributions can vary significantly. The heterogeneity in input data distributions across devices, commonly referred to as the feature shift problem, can adversely impact the training convergence and accuracy of the global model. To analyze the intrinsic causes of the feature shift problem, we develop a generalization error bound in FL, which motivates us to propose FedCiR, a client-invariant representation learning framework that enables clients to extract informative and client-invariant features. Specifically, we improve the mutual information term between representations and labels to encourage representations to carry essential classification knowledge, and diminish the mutual information term between the client set and representations conditioned on labels to promote representations of clients to be client-invariant. We further incorporate two regularizers into the FL framework to bound the mutual information terms with an approximate global representation distribution to compensate for the absence of the ground-truth global representation distribution, thus achieving informative and client-invariant feature extraction. To achieve global representation distribution approximation, we propose a data-free mechanism performed by the server without compromising privacy. Extensive experiments demonstrate the effectiveness of our approach in achieving client-invariant representation learning and solving the data heterogeneity issue.

Viaarxiv icon

Feature Matching Data Synthesis for Non-IID Federated Learning

Aug 09, 2023
Zijian Li, Yuchang Sun, Jiawei Shao, Yuyi Mao, Jessie Hui Wang, Jun Zhang

Figure 1 for Feature Matching Data Synthesis for Non-IID Federated Learning
Figure 2 for Feature Matching Data Synthesis for Non-IID Federated Learning
Figure 3 for Feature Matching Data Synthesis for Non-IID Federated Learning
Figure 4 for Feature Matching Data Synthesis for Non-IID Federated Learning

Federated learning (FL) has emerged as a privacy-preserving paradigm that trains neural networks on edge devices without collecting data at a central server. However, FL encounters an inherent challenge in dealing with non-independent and identically distributed (non-IID) data among devices. To address this challenge, this paper proposes a hard feature matching data synthesis (HFMDS) method to share auxiliary data besides local models. Specifically, synthetic data are generated by learning the essential class-relevant features of real samples and discarding the redundant features, which helps to effectively tackle the non-IID issue. For better privacy preservation, we propose a hard feature augmentation method to transfer real features towards the decision boundary, with which the synthetic data not only improve the model generalization but also erase the information of real features. By integrating the proposed HFMDS method with FL, we present a novel FL framework with data augmentation to relieve data heterogeneity. The theoretical analysis highlights the effectiveness of our proposed data synthesis method in solving the non-IID challenge. Simulation results further demonstrate that our proposed HFMDS-FL algorithm outperforms the baselines in terms of accuracy, privacy preservation, and computational cost on various benchmark datasets.

* 16 pages 
Viaarxiv icon

A Survey of What to Share in Federated Learning: Perspectives on Model Utility, Privacy Leakage, and Communication Efficiency

Jul 20, 2023
Jiawei Shao, Zijian Li, Wenqiang Sun, Tailin Zhou, Yuchang Sun, Lumin Liu, Zehong Lin, Jun Zhang

Federated learning (FL) has emerged as a highly effective paradigm for privacy-preserving collaborative training among different parties. Unlike traditional centralized learning, which requires collecting data from each party, FL allows clients to share privacy-preserving information without exposing private datasets. This approach not only guarantees enhanced privacy protection but also facilitates more efficient and secure collaboration among multiple participants. Therefore, FL has gained considerable attention from researchers, promoting numerous surveys to summarize the related works. However, the majority of these surveys concentrate on methods sharing model parameters during the training process, while overlooking the potential of sharing other forms of local information. In this paper, we present a systematic survey from a new perspective, i.e., what to share in FL, with an emphasis on the model utility, privacy leakage, and communication efficiency. This survey differs from previous ones due to four distinct contributions. First, we present a new taxonomy of FL methods in terms of the sharing methods, which includes three categories of shared information: model sharing, synthetic data sharing, and knowledge sharing. Second, we analyze the vulnerability of different sharing methods to privacy attacks and review the defense mechanisms that provide certain privacy guarantees. Third, we conduct extensive experiments to compare the performance and communication overhead of various sharing methods in FL. Besides, we assess the potential privacy leakage through model inversion and membership inference attacks, while comparing the effectiveness of various defense approaches. Finally, we discuss potential deficiencies in current methods and outline future directions for improvement.

Viaarxiv icon

TNPAR: Topological Neural Poisson Auto-Regressive Model for Learning Granger Causal Structure from Event Sequences

Jun 25, 2023
Ruichu Cai, Yuequn Liu, Wei Chen, Jie Qiao, Yuguang Yan, Zijian Li, Keli Zhang, Zhifeng Hao

Figure 1 for TNPAR: Topological Neural Poisson Auto-Regressive Model for Learning Granger Causal Structure from Event Sequences
Figure 2 for TNPAR: Topological Neural Poisson Auto-Regressive Model for Learning Granger Causal Structure from Event Sequences
Figure 3 for TNPAR: Topological Neural Poisson Auto-Regressive Model for Learning Granger Causal Structure from Event Sequences
Figure 4 for TNPAR: Topological Neural Poisson Auto-Regressive Model for Learning Granger Causal Structure from Event Sequences

Learning Granger causality from event sequences is a challenging but essential task across various applications. Most existing methods rely on the assumption that event sequences are independent and identically distributed (i.i.d.). However, this i.i.d. assumption is often violated due to the inherent dependencies among the event sequences. Fortunately, in practice, we find these dependencies can be modeled by a topological network, suggesting a potential solution to the non-i.i.d. problem by introducing the prior topological network into Granger causal discovery. This observation prompts us to tackle two ensuing challenges: 1) how to model the event sequences while incorporating both the prior topological network and the latent Granger causal structure, and 2) how to learn the Granger causal structure. To this end, we devise a two-stage unified topological neural Poisson auto-regressive model. During the generation stage, we employ a variant of the neural Poisson process to model the event sequences, considering influences from both the topological network and the Granger causal structure. In the inference stage, we formulate an amortized inference algorithm to infer the latent Granger causal structure. We encapsulate these two stages within a unified likelihood function, providing an end-to-end framework for this task.

Viaarxiv icon

Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection

Jul 12, 2022
Xubin Zhong, Changxing Ding, Zijian Li, Shaoli Huang

Figure 1 for Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection
Figure 2 for Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection
Figure 3 for Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection
Figure 4 for Towards Hard-Positive Query Mining for DETR-based Human-Object Interaction Detection

Human-Object Interaction (HOI) detection is a core task for high-level image understanding. Recently, Detection Transformer (DETR)-based HOI detectors have become popular due to their superior performance and efficient structure. However, these approaches typically adopt fixed HOI queries for all testing images, which is vulnerable to the location change of objects in one specific image. Accordingly, in this paper, we propose to enhance DETR's robustness by mining hard-positive queries, which are forced to make correct predictions using partial visual cues. First, we explicitly compose hard-positive queries according to the ground-truth (GT) position of labeled human-object pairs for each training image. Specifically, we shift the GT bounding boxes of each labeled human-object pair so that the shifted boxes cover only a certain portion of the GT ones. We encode the coordinates of the shifted boxes for each labeled human-object pair into an HOI query. Second, we implicitly construct another set of hard-positive queries by masking the top scores in cross-attention maps of the decoder layers. The masked attention maps then only cover partial important cues for HOI predictions. Finally, an alternate strategy is proposed that efficiently combines both types of hard queries. In each iteration, both DETR's learnable queries and one selected type of hard-positive queries are adopted for loss computation. Experimental results show that our proposed approach can be widely applied to existing DETR-based HOI detectors. Moreover, we consistently achieve state-of-the-art performance on three benchmarks: HICO-DET, V-COCO, and HOI-A. Code is available at https://github.com/MuchHair/HQM.

* Accepted by ECCV2022 
Viaarxiv icon

Federated Learning with GAN-based Data Synthesis for Non-IID Clients

Jun 11, 2022
Zijian Li, Jiawei Shao, Yuyi Mao, Jessie Hui Wang, Jun Zhang

Figure 1 for Federated Learning with GAN-based Data Synthesis for Non-IID Clients
Figure 2 for Federated Learning with GAN-based Data Synthesis for Non-IID Clients
Figure 3 for Federated Learning with GAN-based Data Synthesis for Non-IID Clients
Figure 4 for Federated Learning with GAN-based Data Synthesis for Non-IID Clients

Federated learning (FL) has recently emerged as a popular privacy-preserving collaborative learning paradigm. However, it suffers from the non-independent and identically distributed (non-IID) data among clients. In this paper, we propose a novel framework, named Synthetic Data Aided Federated Learning (SDA-FL), to resolve this non-IID challenge by sharing synthetic data. Specifically, each client pretrains a local generative adversarial network (GAN) to generate differentially private synthetic data, which are uploaded to the parameter server (PS) to construct a global shared synthetic dataset. To generate confident pseudo labels for the synthetic dataset, we also propose an iterative pseudo labeling mechanism performed by the PS. A combination of the local private dataset and synthetic dataset with confident pseudo labels leads to nearly identical data distributions among clients, which improves the consistency among local models and benefits the global aggregation. Extensive experiments evidence that the proposed framework outperforms the baseline methods by a large margin in several benchmark datasets under both the supervised and semi-supervised settings.

* 8 pages. To be published in the International Workshop on Trustworthy Federated Learning in Conjunction with IJCAI 2022 (FL-IJCAI'22) 
Viaarxiv icon

Time-Series Domain Adaptation via Sparse Associative Structure Alignment: Learning Invariance and Variance

May 07, 2022
Zijian Li, Ruichu Cai, Jiawei Chen, Yuguan Yan, Wei Chen, Keli Zhang, Junjian Ye

Figure 1 for Time-Series Domain Adaptation via Sparse Associative Structure Alignment: Learning Invariance and Variance
Figure 2 for Time-Series Domain Adaptation via Sparse Associative Structure Alignment: Learning Invariance and Variance
Figure 3 for Time-Series Domain Adaptation via Sparse Associative Structure Alignment: Learning Invariance and Variance
Figure 4 for Time-Series Domain Adaptation via Sparse Associative Structure Alignment: Learning Invariance and Variance

Domain adaptation on time-series data is often encountered in the industry but received limited attention in academia. Most of the existing domain adaptation methods for time-series data borrow the ideas from the existing methods for non-time series data to extract the domain-invariant representation. However, two peculiar difficulties to time-series data have not been solved. 1) It is not a trivial task to model the domain-invariant and complex dependence among different timestamps. 2) The domain-variant information is important but how to leverage them is almost underexploited. Fortunately, the stableness of causal structures among different domains inspires us to explore the structures behind the time-series data. Based on this inspiration, we investigate the domain-invariant unweighted sparse associative structures and the domain-variant strengths of the structures. To achieve this, we propose Sparse Associative structure alignment by learning Invariance and Variance (SASA-IV in short), a model that simultaneously aligns the invariant unweighted spare associative structures and considers the variant information for time-series unsupervised domain adaptation. Technologically, we extract the domain-invariant unweighted sparse associative structures with a unidirectional alignment restriction and embed the domain-variant strengths via a well-designed autoregressive module. Experimental results not only testify that our model yields state-of-the-art performance on three real-world datasets but also provide some insightful discoveries on knowledge transfer.

* arXiv admin note: text overlap with arXiv:2012.11797 
Viaarxiv icon

HL-Net: Heterophily Learning Network for Scene Graph Generation

May 04, 2022
Xin Lin, Changxing Ding, Yibing Zhan, Zijian Li, Dacheng Tao

Figure 1 for HL-Net: Heterophily Learning Network for Scene Graph Generation
Figure 2 for HL-Net: Heterophily Learning Network for Scene Graph Generation
Figure 3 for HL-Net: Heterophily Learning Network for Scene Graph Generation
Figure 4 for HL-Net: Heterophily Learning Network for Scene Graph Generation

Scene graph generation (SGG) aims to detect objects and predict their pairwise relationships within an image. Current SGG methods typically utilize graph neural networks (GNNs) to acquire context information between objects/relationships. Despite their effectiveness, however, current SGG methods only assume scene graph homophily while ignoring heterophily. Accordingly, in this paper, we propose a novel Heterophily Learning Network (HL-Net) to comprehensively explore the homophily and heterophily between objects/relationships in scene graphs. More specifically, HL-Net comprises the following 1) an adaptive reweighting transformer module, which adaptively integrates the information from different layers to exploit both the heterophily and homophily in objects; 2) a relationship feature propagation module that efficiently explores the connections between relationships by considering heterophily in order to refine the relationship representation; 3) a heterophily-aware message-passing scheme to further distinguish the heterophily and homophily between objects/relationships, thereby facilitating improved message passing in graphs. We conducted extensive experiments on two public datasets: Visual Genome (VG) and Open Images (OI). The experimental results demonstrate the superiority of our proposed HL-Net over existing state-of-the-art approaches. In more detail, HL-Net outperforms the second-best competitors by 2.1$\%$ on the VG dataset for scene graph classification and 1.2$\%$ on the IO dataset for the final score. Code is available at https://github.com/siml3/HL-Net.

* Accepted to CVPR 2022 
Viaarxiv icon

Motif Graph Neural Network

Jan 17, 2022
Xuexin Chen, Ruichu Cai, Yuan Fang, Min Wu, Zijian Li, Zhifeng Hao

Figure 1 for Motif Graph Neural Network
Figure 2 for Motif Graph Neural Network
Figure 3 for Motif Graph Neural Network
Figure 4 for Motif Graph Neural Network

Graphs can model complicated interactions between entities, which naturally emerge in many important applications. These applications can often be cast into standard graph learning tasks, in which a crucial step is to learn low-dimensional graph representations. Graph neural networks (GNNs) are currently the most popular model in graph embedding approaches. However, standard GNNs in the neighborhood aggregation paradigm suffer from limited discriminative power in distinguishing \emph{high-order} graph structures as opposed to \emph{low-order} structures. To capture high-order structures, researchers have resorted to motifs and developed motif-based GNNs. However, existing motif-based GNNs still often suffer from less discriminative power on high-order structures. To overcome the above limitations, we propose Motif Graph Neural Network (MGNN), a novel framework to better capture high-order structures, hinging on our proposed motif redundancy minimization operator and injective motif combination. First, MGNN produces a set of node representations w.r.t. each motif. The next phase is our proposed redundancy minimization among motifs which compares the motifs with each other and distills the features unique to each motif. Finally, MGNN performs the updating of node representations by combining multiple representations from different motifs. In particular, to enhance the discriminative power, MGNN utilizes an injective function to combine the representations w.r.t. different motifs. We further show that our proposed architecture increases the expressive power of GNNs with a theoretical analysis. We demonstrate that MGNN outperforms state-of-the-art methods on seven public benchmarks on both node classification and graph classification tasks.

Viaarxiv icon