Alert button
Picture for Shenda Hong

Shenda Hong

Alert button

TEST: Text Prototype Aligned Embedding to Activate LLM's Ability for Time Series

Aug 16, 2023
Chenxi Sun, Yaliang Li, Hongyan Li, Shenda Hong

Figure 1 for TEST: Text Prototype Aligned Embedding to Activate LLM's Ability for Time Series
Figure 2 for TEST: Text Prototype Aligned Embedding to Activate LLM's Ability for Time Series
Figure 3 for TEST: Text Prototype Aligned Embedding to Activate LLM's Ability for Time Series
Figure 4 for TEST: Text Prototype Aligned Embedding to Activate LLM's Ability for Time Series

This work summarizes two strategies for completing time-series (TS) tasks using today's language model (LLM): LLM-for-TS, design and train a fundamental large model for TS data; TS-for-LLM, enable the pre-trained LLM to handle TS data. Considering the insufficient data accumulation, limited resources, and semantic context requirements, this work focuses on TS-for-LLM methods, where we aim to activate LLM's ability for TS data by designing a TS embedding method suitable for LLM. The proposed method is named TEST. It first tokenizes TS, builds an encoder to embed them by instance-wise, feature-wise, and text-prototype-aligned contrast, and then creates prompts to make LLM more open to embeddings, and finally implements TS tasks. Experiments are carried out on TS classification and forecasting tasks using 8 LLMs with different structures and sizes. Although its results cannot significantly outperform the current SOTA models customized for TS tasks, by treating LLM as the pattern machine, it can endow LLM's ability to process TS data without compromising the language ability. This paper is intended to serve as a foundational work that will inspire further research.

* 10 pages, 6 figures 
Viaarxiv icon

VQGraph: Graph Vector-Quantization for Bridging GNNs and MLPs

Aug 04, 2023
Ling Yang, Ye Tian, Minkai Xu, Zhongyi Liu, Shenda Hong, Wei Qu, Wentao Zhang, Bin Cui, Muhan Zhang, Jure Leskovec

Figure 1 for VQGraph: Graph Vector-Quantization for Bridging GNNs and MLPs
Figure 2 for VQGraph: Graph Vector-Quantization for Bridging GNNs and MLPs
Figure 3 for VQGraph: Graph Vector-Quantization for Bridging GNNs and MLPs
Figure 4 for VQGraph: Graph Vector-Quantization for Bridging GNNs and MLPs

Graph Neural Networks (GNNs) conduct message passing which aggregates local neighbors to update node representations. Such message passing leads to scalability issues in practical latency-constrained applications. To address this issue, recent methods adopt knowledge distillation (KD) to learn computationally-efficient multi-layer perceptron (MLP) by mimicking the output of GNN. However, the existing GNN representation space may not be expressive enough for representing diverse local structures of the underlying graph, which limits the knowledge transfer from GNN to MLP. Here we present a novel framework VQGraph to learn a powerful graph representation space for bridging GNNs and MLPs. We adopt the encoder of a variant of a vector-quantized variational autoencoder (VQ-VAE) as a structure-aware graph tokenizer, which explicitly represents the nodes of diverse local structures as numerous discrete tokens and constitutes a meaningful codebook. Equipped with the learned codebook, we propose a new token-based distillation objective based on soft token assignments to sufficiently transfer the structural knowledge from GNN to MLP. Extensive experiments and analyses demonstrate the strong performance of VQGraph, where we achieve new state-of-the-art performance on GNN-MLP distillation in both transductive and inductive settings across seven graph datasets. We show that VQGraph with better performance infers faster than GNNs by 828x, and also achieves accuracy improvement over GNNs and stand-alone MLPs by 3.90% and 28.05% on average, respectively. Code: https://github.com/YangLing0818/VQGraph.

* arXiv admin note: text overlap with arXiv:1906.00446 by other authors 
Viaarxiv icon

Individual and Structural Graph Information Bottlenecks for Out-of-Distribution Generalization

Jun 28, 2023
Ling Yang, Jiayi Zheng, Heyuan Wang, Zhongyi Liu, Zhilin Huang, Shenda Hong, Wentao Zhang, Bin Cui

Figure 1 for Individual and Structural Graph Information Bottlenecks for Out-of-Distribution Generalization
Figure 2 for Individual and Structural Graph Information Bottlenecks for Out-of-Distribution Generalization
Figure 3 for Individual and Structural Graph Information Bottlenecks for Out-of-Distribution Generalization
Figure 4 for Individual and Structural Graph Information Bottlenecks for Out-of-Distribution Generalization

Out-of-distribution (OOD) graph generalization are critical for many real-world applications. Existing methods neglect to discard spurious or noisy features of inputs, which are irrelevant to the label. Besides, they mainly conduct instance-level class-invariant graph learning and fail to utilize the structural class relationships between graph instances. In this work, we endeavor to address these issues in a unified framework, dubbed Individual and Structural Graph Information Bottlenecks (IS-GIB). To remove class spurious feature caused by distribution shifts, we propose Individual Graph Information Bottleneck (I-GIB) which discards irrelevant information by minimizing the mutual information between the input graph and its embeddings. To leverage the structural intra- and inter-domain correlations, we propose Structural Graph Information Bottleneck (S-GIB). Specifically for a batch of graphs with multiple domains, S-GIB first computes the pair-wise input-input, embedding-embedding, and label-label correlations. Then it minimizes the mutual information between input graph and embedding pairs while maximizing the mutual information between embedding and label pairs. The critical insight of S-GIB is to simultaneously discard spurious features and learn invariant features from a high-order perspective by maintaining class relationships under multiple distributional shifts. Notably, we unify the proposed I-GIB and S-GIB to form our complementary framework IS-GIB. Extensive experiments conducted on both node- and graph-level tasks consistently demonstrate the superior generalization ability of IS-GIB. The code is available at https://github.com/YangLing0818/GraphOOD.

* Accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE) 
Viaarxiv icon

Frozen Language Model Helps ECG Zero-Shot Learning

Mar 22, 2023
Jun Li, Che Liu, Sibo Cheng, Rossella Arcucci, Shenda Hong

Figure 1 for Frozen Language Model Helps ECG Zero-Shot Learning
Figure 2 for Frozen Language Model Helps ECG Zero-Shot Learning
Figure 3 for Frozen Language Model Helps ECG Zero-Shot Learning
Figure 4 for Frozen Language Model Helps ECG Zero-Shot Learning

The electrocardiogram (ECG) is one of the most commonly used non-invasive, convenient medical monitoring tools that assist in the clinical diagnosis of heart diseases. Recently, deep learning (DL) techniques, particularly self-supervised learning (SSL), have demonstrated great potential in the classification of ECG. SSL pre-training has achieved competitive performance with only a small amount of annotated data after fine-tuning. However, current SSL methods rely on the availability of annotated data and are unable to predict labels not existing in fine-tuning datasets. To address this challenge, we propose Multimodal ECG-Text Self-supervised pre-training (METS), the first work to utilize the auto-generated clinical reports to guide ECG SSL pre-training. We use a trainable ECG encoder and a frozen language model to embed paired ECG and automatically machine-generated clinical reports separately. The SSL aims to maximize the similarity between paired ECG and auto-generated report while minimize the similarity between ECG and other reports. In downstream classification tasks, METS achieves around 10% improvement in performance without using any annotated data via zero-shot classification, compared to other supervised and SSL baselines that rely on annotated data. Furthermore, METS achieves the highest recall and F1 scores on the MIT-BIH dataset, despite MIT-BIH containing different classes of ECG compared to the pre-trained dataset. The extensive experiments have demonstrated the advantages of using ECG-Text multimodal self-supervised learning in terms of generalizability, effectiveness, and efficiency.

Viaarxiv icon

Artificial Intelligence System for Detection and Screening of Cardiac Abnormalities using Electrocardiogram Images

Feb 10, 2023
Deyun Zhang, Shijia Geng, Yang Zhou, Weilun Xu, Guodong Wei, Kai Wang, Jie Yu, Qiang Zhu, Yongkui Li, Yonghong Zhao, Xingyue Chen, Rui Zhang, Zhaoji Fu, Rongbo Zhou, Yanqi E, Sumei Fan, Qinghao Zhao, Chuandong Cheng, Nan Peng, Liang Zhang, Linlin Zheng, Jianjun Chu, Hongbin Xu, Chen Tan, Jian Liu, Huayue Tao, Tong Liu, Kangyin Chen, Chenyang Jiang, Xingpeng Liu, Shenda Hong

Figure 1 for Artificial Intelligence System for Detection and Screening of Cardiac Abnormalities using Electrocardiogram Images
Figure 2 for Artificial Intelligence System for Detection and Screening of Cardiac Abnormalities using Electrocardiogram Images
Figure 3 for Artificial Intelligence System for Detection and Screening of Cardiac Abnormalities using Electrocardiogram Images
Figure 4 for Artificial Intelligence System for Detection and Screening of Cardiac Abnormalities using Electrocardiogram Images

The artificial intelligence (AI) system has achieved expert-level performance in electrocardiogram (ECG) signal analysis. However, in underdeveloped countries or regions where the healthcare information system is imperfect, only paper ECGs can be provided. Analysis of real-world ECG images (photos or scans of paper ECGs) remains challenging due to complex environments or interference. In this study, we present an AI system developed to detect and screen cardiac abnormalities (CAs) from real-world ECG images. The system was evaluated on a large dataset of 52,357 patients from multiple regions and populations across the world. On the detection task, the AI system obtained area under the receiver operating curve (AUC) of 0.996 (hold-out test), 0.994 (external test 1), 0.984 (external test 2), and 0.979 (external test 3), respectively. Meanwhile, the detection results of AI system showed a strong correlation with the diagnosis of cardiologists (cardiologist 1 (R=0.794, p<1e-3), cardiologist 2 (R=0.812, p<1e-3)). On the screening task, the AI system achieved AUCs of 0.894 (hold-out test) and 0.850 (external test). The screening performance of the AI system was better than that of the cardiologists (AI system (0.846) vs. cardiologist 1 (0.520) vs. cardiologist 2 (0.480)). Our study demonstrates the feasibility of an accurate, objective, easy-to-use, fast, and low-cost AI system for CA detection and screening. The system has the potential to be used by healthcare professionals, caregivers, and general users to assess CAs based on real-world ECG images.

* 47 pages, 29 figures 
Viaarxiv icon

Towards Better Time Series Contrastive Learning: A Dynamic Bad Pair Mining Approach

Feb 07, 2023
Xiang Lan, Hanshu Yan, Shenda Hong, Mengling Feng

Figure 1 for Towards Better Time Series Contrastive Learning: A Dynamic Bad Pair Mining Approach
Figure 2 for Towards Better Time Series Contrastive Learning: A Dynamic Bad Pair Mining Approach
Figure 3 for Towards Better Time Series Contrastive Learning: A Dynamic Bad Pair Mining Approach
Figure 4 for Towards Better Time Series Contrastive Learning: A Dynamic Bad Pair Mining Approach

Not all positive pairs are beneficial to time series contrastive learning. In this paper, we study two types of bad positive pairs that impair the quality of time series representation learned through contrastive learning ($i.e.$, noisy positive pair and faulty positive pair). We show that, with the presence of noisy positive pairs, the model tends to simply learn the pattern of noise (Noisy Alignment). Meanwhile, when faulty positive pairs arise, the model spends considerable efforts aligning non-representative patterns (Faulty Alignment). To address this problem, we propose a Dynamic Bad Pair Mining (DBPM) algorithm, which reliably identifies and suppresses bad positive pairs in time series contrastive learning. DBPM utilizes a memory module to track the training behavior of each positive pair along training process. This allows us to identify potential bad positive pairs at each epoch based on their historical training behaviors. The identified bad pairs are then down-weighted using a transformation module. Our experimental results show that DBPM effectively mitigates the negative impacts of bad pairs, and can be easily used as a plug-in to boost performance of state-of-the-art methods. Codes will be made publicly available.

* Preprint. Under review 
Viaarxiv icon

Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-Training

Nov 21, 2022
Ling Yang, Zhilin Huang, Yang Song, Shenda Hong, Guohao Li, Wentao Zhang, Bin Cui, Bernard Ghanem, Ming-Hsuan Yang

Figure 1 for Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-Training
Figure 2 for Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-Training
Figure 3 for Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-Training
Figure 4 for Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-Training

Generating images from graph-structured inputs, such as scene graphs, is uniquely challenging due to the difficulty of aligning nodes and connections in graphs with objects and their relations in images. Most existing methods address this challenge by using scene layouts, which are image-like representations of scene graphs designed to capture the coarse structures of scene images. Because scene layouts are manually crafted, the alignment with images may not be fully optimized, causing suboptimal compliance between the generated images and the original scene graphs. To tackle this issue, we propose to learn scene graph embeddings by directly optimizing their alignment with images. Specifically, we pre-train an encoder to extract both global and local information from scene graphs that are predictive of the corresponding images, relying on two loss functions: masked autoencoding loss and contrastive loss. The former trains embeddings by reconstructing randomly masked image regions, while the latter trains embeddings to discriminate between compliant and non-compliant images according to the scene graph. Given these embeddings, we build a latent diffusion model to generate images from scene graphs. The resulting method, called SGDiff, allows for the semantic manipulation of generated images by modifying scene graph nodes and connections. On the Visual Genome and COCO-Stuff datasets, we demonstrate that SGDiff outperforms state-of-the-art methods, as measured by both the Inception Score and Fr\'echet Inception Distance (FID) metrics. We will release our source code and trained models at https://github.com/YangLing0818/SGDiff.

* Code and models shall be released at https://github.com/YangLing0818/SGDiff 
Viaarxiv icon

Continuous Diagnosis and Prognosis by Controlling the Update Process of Deep Neural Networks

Oct 06, 2022
Chenxi Sun, Hongyan Li, Moxian Song, Derun Cai, Baofeng Zhang, Shenda Hong

Figure 1 for Continuous Diagnosis and Prognosis by Controlling the Update Process of Deep Neural Networks
Figure 2 for Continuous Diagnosis and Prognosis by Controlling the Update Process of Deep Neural Networks
Figure 3 for Continuous Diagnosis and Prognosis by Controlling the Update Process of Deep Neural Networks
Figure 4 for Continuous Diagnosis and Prognosis by Controlling the Update Process of Deep Neural Networks

Continuous diagnosis and prognosis are essential for intensive care patients. It can provide more opportunities for timely treatment and rational resource allocation, especially for sepsis, a main cause of death in ICU, and COVID-19, a new worldwide epidemic. Although deep learning methods have shown their great superiority in many medical tasks, they tend to catastrophically forget, over fit, and get results too late when performing diagnosis and prognosis in the continuous mode. In this work, we summarized the three requirements of this task, proposed a new concept, continuous classification of time series (CCTS), and designed a novel model training method, restricted update strategy of neural networks (RU). In the context of continuous prognosis, our method outperformed all baselines and achieved the average accuracy of 90%, 97%, and 85% on sepsis prognosis, COVID-19 mortality prediction, and eight diseases classification. Superiorly, our method can also endow deep learning with interpretability, having the potential to explore disease mechanisms and provide a new horizon for medical research. We have achieved disease staging for sepsis and COVID-19, discovering four stages and three stages with their typical biomarkers respectively. Further, our method is a data-agnostic and model-agnostic plug-in, it can be used to continuously prognose other diseases with staging and even implement CCTS in other fields.

* 41 pages, 15 figures 
Viaarxiv icon

Diffusion Models: A Comprehensive Survey of Methods and Applications

Sep 15, 2022
Ling Yang, Zhilong Zhang, Shenda Hong, Runsheng Xu, Yue Zhao, Yingxia Shao, Wentao Zhang, Ming-Hsuan Yang, Bin Cui

Figure 1 for Diffusion Models: A Comprehensive Survey of Methods and Applications
Figure 2 for Diffusion Models: A Comprehensive Survey of Methods and Applications
Figure 3 for Diffusion Models: A Comprehensive Survey of Methods and Applications
Figure 4 for Diffusion Models: A Comprehensive Survey of Methods and Applications

Diffusion models are a class of deep generative models that have shown impressive results on various tasks with a solid theoretical foundation. Despite demonstrated success than state-of-the-art approaches, diffusion models often entail costly sampling procedures and sub-optimal likelihood estimation. Significant efforts have been made to improve the performance of diffusion models in various aspects. In this article, we present a comprehensive review of existing variants of diffusion models. Specifically, we provide the taxonomy of diffusion models and categorize them into three types: sampling-acceleration enhancement, likelihood-maximization enhancement, and data-generalization enhancement. We also introduce the other generative models (i.e., variational autoencoders, generative adversarial networks, normalizing flow, autoregressive models, and energy-based models) and discuss the connections between diffusion models and these generative models. Then we review the applications of diffusion models, including computer vision, natural language processing, waveform signal processing, multi-modal modeling, molecular graph generation, time series modeling, and adversarial purification. Furthermore, we propose new perspectives pertaining to the development of generative models. Github: https://github.com/YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy.

* 33 pages, citing 255 papers, project: https://github.com/YangLing0818/Diffusion-Models-Papers-Survey-Taxonomy 
Viaarxiv icon

Confidence-Guided Learning Process for Continuous Classification of Time Series

Aug 14, 2022
Chenxi Sun, Moxian Song, Derun Can, Baofeng Zhang, Shenda Hong, Hongyan Li

Figure 1 for Confidence-Guided Learning Process for Continuous Classification of Time Series
Figure 2 for Confidence-Guided Learning Process for Continuous Classification of Time Series
Figure 3 for Confidence-Guided Learning Process for Continuous Classification of Time Series
Figure 4 for Confidence-Guided Learning Process for Continuous Classification of Time Series

In the real world, the class of a time series is usually labeled at the final time, but many applications require to classify time series at every time point. e.g. the outcome of a critical patient is only determined at the end, but he should be diagnosed at all times for timely treatment. Thus, we propose a new concept: Continuous Classification of Time Series (CCTS). It requires the model to learn data in different time stages. But the time series evolves dynamically, leading to different data distributions. When a model learns multi-distribution, it always forgets or overfits. We suggest that meaningful learning scheduling is potential due to an interesting observation: Measured by confidence, the process of model learning multiple distributions is similar to the process of human learning multiple knowledge. Thus, we propose a novel Confidence-guided method for CCTS (C3TS). It can imitate the alternating human confidence described by the Dunning-Kruger Effect. We define the objective- confidence to arrange data, and the self-confidence to control the learning duration. Experiments on four real-world datasets show that C3TS is more accurate than all baselines for CCTS.

* 20 pages, 12 figures 
Viaarxiv icon