Alert button
Picture for Haishuai Wang

Haishuai Wang

Alert button

Contrast Everything: A Hierarchical Contrastive Framework for Medical Time-Series

Oct 28, 2023
Yihe Wang, Yu Han, Haishuai Wang, Xiang Zhang

Figure 1 for Contrast Everything: A Hierarchical Contrastive Framework for Medical Time-Series
Figure 2 for Contrast Everything: A Hierarchical Contrastive Framework for Medical Time-Series
Figure 3 for Contrast Everything: A Hierarchical Contrastive Framework for Medical Time-Series
Figure 4 for Contrast Everything: A Hierarchical Contrastive Framework for Medical Time-Series

Contrastive representation learning is crucial in medical time series analysis as it alleviates dependency on labor-intensive, domain-specific, and scarce expert annotations. However, existing contrastive learning methods primarily focus on one single data level, which fails to fully exploit the intricate nature of medical time series. To address this issue, we present COMET, an innovative hierarchical framework that leverages data consistencies at all inherent levels in medical time series. Our meticulously designed model systematically captures data consistency from four potential levels: observation, sample, trial, and patient levels. By developing contrastive loss at multiple levels, we can learn effective representations that preserve comprehensive data consistency, maximizing information utilization in a self-supervised manner. We conduct experiments in the challenging patient-independent setting. We compare COMET against six baselines using three diverse datasets, which include ECG signals for myocardial infarction and EEG signals for Alzheimer's and Parkinson's diseases. The results demonstrate that COMET consistently outperforms all baselines, particularly in setup with 10% and 1% labeled data fractions across all datasets. These results underscore the significant impact of our framework in advancing contrastive representation learning techniques for medical time series. The source code is available at https://github.com/DL4mHealth/COMET.

* NeurIPS 2023  
* Accepted by NeurIPS 2023; 24pages (13 pages main paper + 11 pages supplementary materials) 
Viaarxiv icon

Partition Speeds Up Learning Implicit Neural Representations Based on Exponential-Increase Hypothesis

Oct 22, 2023
Ke Liu, Feng Liu, Haishuai Wang, Ning Ma, Jiajun Bu, Bo Han

Figure 1 for Partition Speeds Up Learning Implicit Neural Representations Based on Exponential-Increase Hypothesis
Figure 2 for Partition Speeds Up Learning Implicit Neural Representations Based on Exponential-Increase Hypothesis
Figure 3 for Partition Speeds Up Learning Implicit Neural Representations Based on Exponential-Increase Hypothesis
Figure 4 for Partition Speeds Up Learning Implicit Neural Representations Based on Exponential-Increase Hypothesis

$\textit{Implicit neural representations}$ (INRs) aim to learn a $\textit{continuous function}$ (i.e., a neural network) to represent an image, where the input and output of the function are pixel coordinates and RGB/Gray values, respectively. However, images tend to consist of many objects whose colors are not perfectly consistent, resulting in the challenge that image is actually a $\textit{discontinuous piecewise function}$ and cannot be well estimated by a continuous function. In this paper, we empirically investigate that if a neural network is enforced to fit a discontinuous piecewise function to reach a fixed small error, the time costs will increase exponentially with respect to the boundaries in the spatial domain of the target signal. We name this phenomenon the $\textit{exponential-increase}$ hypothesis. Under the $\textit{exponential-increase}$ hypothesis, learning INRs for images with many objects will converge very slowly. To address this issue, we first prove that partitioning a complex signal into several sub-regions and utilizing piecewise INRs to fit that signal can significantly speed up the convergence. Based on this fact, we introduce a simple partition mechanism to boost the performance of two INR methods for image reconstruction: one for learning INRs, and the other for learning-to-learn INRs. In both cases, we partition an image into different sub-regions and dedicate smaller networks for each part. In addition, we further propose two partition rules based on regular grids and semantic segmentation maps, respectively. Extensive experiments validate the effectiveness of the proposed partitioning methods in terms of learning INR for a single image (ordinary learning framework) and the learning-to-learn framework.

* Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023 
Viaarxiv icon

Graph Neural Architecture Search with GPT-4

Sep 30, 2023
Haishuai Wang, Yang Gao, Xin Zheng, Peng Zhang, Hongyang Chen, Jiajun Bu

Figure 1 for Graph Neural Architecture Search with GPT-4
Figure 2 for Graph Neural Architecture Search with GPT-4
Figure 3 for Graph Neural Architecture Search with GPT-4
Figure 4 for Graph Neural Architecture Search with GPT-4

Graph Neural Architecture Search (GNAS) has shown promising results in automatically designing graph neural networks. However, GNAS still requires intensive human labor with rich domain knowledge to design the search space and search strategy. In this paper, we integrate GPT-4 into GNAS and propose a new GPT-4 based Graph Neural Architecture Search method (GPT4GNAS for short). The basic idea of our method is to design a new class of prompts for GPT-4 to guide GPT-4 toward the generative task of graph neural architectures. The prompts consist of descriptions of the search space, search strategy, and search feedback of GNAS. By iteratively running GPT-4 with the prompts, GPT4GNAS generates more accurate graph neural networks with fast convergence. Experimental results show that embedding GPT-4 into GNAS outperforms the state-of-the-art GNAS methods.

Viaarxiv icon

Multi-View Fusion and Distillation for Subgrade Distresses Detection based on 3D-GPR

Aug 09, 2023
Chunpeng Zhou, Kangjie Ning, Haishuai Wang, Zhi Yu, Sheng Zhou, Jiajun Bu

Figure 1 for Multi-View Fusion and Distillation for Subgrade Distresses Detection based on 3D-GPR
Figure 2 for Multi-View Fusion and Distillation for Subgrade Distresses Detection based on 3D-GPR
Figure 3 for Multi-View Fusion and Distillation for Subgrade Distresses Detection based on 3D-GPR
Figure 4 for Multi-View Fusion and Distillation for Subgrade Distresses Detection based on 3D-GPR

The application of 3D ground-penetrating radar (3D-GPR) for subgrade distress detection has gained widespread popularity. To enhance the efficiency and accuracy of detection, pioneering studies have attempted to adopt automatic detection techniques, particularly deep learning. However, existing works typically rely on traditional 1D A-scan, 2D B-scan or 3D C-scan data of the GPR, resulting in either insufficient spatial information or high computational complexity. To address these challenges, we introduce a novel methodology for the subgrade distress detection task by leveraging the multi-view information from 3D-GPR data. Moreover, we construct a real multi-view image dataset derived from the original 3D-GPR data for the detection task, which provides richer spatial information compared to A-scan and B-scan data, while reducing computational complexity compared to C-scan data. Subsequently, we develop a novel \textbf{M}ulti-\textbf{V}iew \textbf{V}usion and \textbf{D}istillation framework, \textbf{GPR-MVFD}, specifically designed to optimally utilize the multi-view GPR dataset. This framework ingeniously incorporates multi-view distillation and attention-based fusion to facilitate significant feature extraction for subgrade distresses. In addition, a self-adaptive learning mechanism is adopted to stabilize the model training and prevent performance degeneration in each branch. Extensive experiments conducted on this new GPR benchmark demonstrate the effectiveness and efficiency of our proposed framework. Our framework outperforms not only the existing GPR baselines, but also the state-of-the-art methods in the fields of multi-view learning, multi-modal learning, and knowledge distillation. We will release the constructed multi-view GPR dataset with expert-annotated labels and the source codes of the proposed framework.

Viaarxiv icon

CPDG: A Contrastive Pre-Training Method for Dynamic Graph Neural Networks

Jul 24, 2023
Yuanchen Bei, Hao Xu, Sheng Zhou, Huixuan Chi, Haishuai Wang, Mengdi Zhang, Zhao Li, Jiajun Bu

Figure 1 for CPDG: A Contrastive Pre-Training Method for Dynamic Graph Neural Networks
Figure 2 for CPDG: A Contrastive Pre-Training Method for Dynamic Graph Neural Networks
Figure 3 for CPDG: A Contrastive Pre-Training Method for Dynamic Graph Neural Networks
Figure 4 for CPDG: A Contrastive Pre-Training Method for Dynamic Graph Neural Networks

Dynamic graph data mining has gained popularity in recent years due to the rich information contained in dynamic graphs and their widespread use in the real world. Despite the advances in dynamic graph neural networks (DGNNs), the rich information and diverse downstream tasks have posed significant difficulties for the practical application of DGNNs in industrial scenarios. To this end, in this paper, we propose to address them by pre-training and present the Contrastive Pre-Training Method for Dynamic Graph Neural Networks (CPDG). CPDG tackles the challenges of pre-training for DGNNs, including generalization capability and long-short term modeling capability, through a flexible structural-temporal subgraph sampler along with structural-temporal contrastive pre-training schemes. Extensive experiments conducted on both large-scale research and industrial dynamic graph datasets show that CPDG outperforms existing methods in dynamic graph pre-training for various downstream tasks under three transfer settings.

* 13 pages, 6 figures 
Viaarxiv icon

hierarchical network with decoupled knowledge distillation for speech emotion recognition

Mar 09, 2023
Ziping Zhao, Huan Wang, Haishuai Wang, Bjorn Schuller

Figure 1 for hierarchical network with decoupled knowledge distillation for speech emotion recognition
Figure 2 for hierarchical network with decoupled knowledge distillation for speech emotion recognition
Figure 3 for hierarchical network with decoupled knowledge distillation for speech emotion recognition
Figure 4 for hierarchical network with decoupled knowledge distillation for speech emotion recognition

The goal of Speech Emotion Recognition (SER) is to enable computers to recognize the emotion category of a given utterance in the same way that humans do. The accuracy of SER is strongly dependent on the validity of the utterance-level representation obtained by the model. Nevertheless, the ``dark knowledge" carried by non-target classes is always ignored by previous studies. In this paper, we propose a hierarchical network, called DKDFMH, which employs decoupled knowledge distillation in a deep convolutional neural network with a fused multi-head attention mechanism. Our approach applies logit distillation to obtain higher-level semantic features from different scales of attention sets and delve into the knowledge carried by non-target classes, thus guiding the model to focus more on the differences between sentiment features. To validate the effectiveness of our model, we conducted experiments on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) dataset. We achieved competitive performance, with 79.1% weighted accuracy (WA) and 77.1% unweighted accuracy (UA). To the best of our knowledge, this is the first time since 2015 that logit distillation has been returned to state-of-the-art status.

* 5 pages,4 figures,icassp 2023 
Viaarxiv icon

Hilbert Distillation for Cross-Dimensionality Networks

Nov 08, 2022
Dian Qin, Haishuai Wang, Zhe Liu, Hongjia Xu, Sheng Zhou, Jiajun Bu

Figure 1 for Hilbert Distillation for Cross-Dimensionality Networks
Figure 2 for Hilbert Distillation for Cross-Dimensionality Networks
Figure 3 for Hilbert Distillation for Cross-Dimensionality Networks
Figure 4 for Hilbert Distillation for Cross-Dimensionality Networks

3D convolutional neural networks have revealed superior performance in processing volumetric data such as video and medical imaging. However, the competitive performance by leveraging 3D networks results in huge computational costs, which are far beyond that of 2D networks. In this paper, we propose a novel Hilbert curve-based cross-dimensionality distillation approach that facilitates the knowledge of 3D networks to improve the performance of 2D networks. The proposed Hilbert Distillation (HD) method preserves the structural information via the Hilbert curve, which maps high-dimensional (>=2) representations to one-dimensional continuous space-filling curves. Since the distilled 2D networks are supervised by the curves converted from dimensionally heterogeneous 3D features, the 2D networks are given an informative view in terms of learning structural information embedded in well-trained high-dimensional representations. We further propose a Variable-length Hilbert Distillation (VHD) method to dynamically shorten the walking stride of the Hilbert curve in activation feature areas and lengthen the stride in context feature areas, forcing the 2D networks to pay more attention to learning from activation features. The proposed algorithm outperforms the current state-of-the-art distillation techniques adapted to cross-dimensionality distillation on two classification tasks. Moreover, the distilled 2D networks by the proposed method achieve competitive performance with the original 3D networks, indicating the lightweight distilled 2D networks could potentially be the substitution of cumbersome 3D networks in the real-world scenario.

* Accepted at NeurIPS 2022 
Viaarxiv icon

Logo-2K+: A Large-Scale Logo Dataset for Scalable Logo Classification

Nov 11, 2019
Jing Wang, Weiqing Min, Sujuan Hou, Shengnan Ma, Yuanjie Zheng, Haishuai Wang, Shuqiang Jiang

Figure 1 for Logo-2K+: A Large-Scale Logo Dataset for Scalable Logo Classification
Figure 2 for Logo-2K+: A Large-Scale Logo Dataset for Scalable Logo Classification
Figure 3 for Logo-2K+: A Large-Scale Logo Dataset for Scalable Logo Classification
Figure 4 for Logo-2K+: A Large-Scale Logo Dataset for Scalable Logo Classification

Logo classification has gained increasing attention for its various applications, such as copyright infringement detection, product recommendation and contextual advertising. Compared with other types of object images, the real-world logo images have larger variety in logo appearance and more complexity in their background. Therefore, recognizing the logo from images is challenging. To support efforts towards scalable logo classification task, we have curated a dataset, Logo-2K+, a new large-scale publicly available real-world logo dataset with 2,341 categories and 167,140 images. Compared with existing popular logo datasets, such as FlickrLogos-32 and LOGO-Net, Logo-2K+ has more comprehensive coverage of logo categories and larger quantity of logo images. Moreover, we propose a Discriminative Region Navigation and Augmentation Network (DRNA-Net), which is capable of discovering more informative logo regions and augmenting these image regions for logo classification. DRNA-Net consists of four sub-networks: the navigator sub-network first selected informative logo-relevant regions guided by the teacher sub-network, which can evaluate its confidence belonging to the ground-truth logo class. The data augmentation sub-network then augments the selected regions via both region cropping and region dropping. Finally, the scrutinizer sub-network fuses features from augmented regions and the whole image for logo classification. Comprehensive experiments on Logo-2K+ and other three existing benchmark datasets demonstrate the effectiveness of proposed method. Logo-2K+ and the proposed strong baseline DRNA-Net are expected to further the development of scalable logo image recognition, and the Logo-2K+ dataset can be found at https://github.com/msn199959/Logo-2k-plus-Dataset.

* Accepted by AAAI2020 
Viaarxiv icon