Alert button
Picture for Lifang He

Lifang He

Alert button

Lehigh University

Unsupervised Skin Lesion Segmentation via Structural Entropy Minimization on Multi-Scale Superpixel Graphs

Sep 05, 2023
Guangjie Zeng, Hao Peng, Angsheng Li, Zhiwei Liu, Chunyang Liu, Philip S. Yu, Lifang He

Figure 1 for Unsupervised Skin Lesion Segmentation via Structural Entropy Minimization on Multi-Scale Superpixel Graphs
Figure 2 for Unsupervised Skin Lesion Segmentation via Structural Entropy Minimization on Multi-Scale Superpixel Graphs
Figure 3 for Unsupervised Skin Lesion Segmentation via Structural Entropy Minimization on Multi-Scale Superpixel Graphs
Figure 4 for Unsupervised Skin Lesion Segmentation via Structural Entropy Minimization on Multi-Scale Superpixel Graphs

Skin lesion segmentation is a fundamental task in dermoscopic image analysis. The complex features of pixels in the lesion region impede the lesion segmentation accuracy, and existing deep learning-based methods often lack interpretability to this problem. In this work, we propose a novel unsupervised Skin Lesion sEgmentation framework based on structural entropy and isolation forest outlier Detection, namely SLED. Specifically, skin lesions are segmented by minimizing the structural entropy of a superpixel graph constructed from the dermoscopic image. Then, we characterize the consistency of healthy skin features and devise a novel multi-scale segmentation mechanism by outlier detection, which enhances the segmentation accuracy by leveraging the superpixel features from multiple scales. We conduct experiments on four skin lesion benchmarks and compare SLED with nine representative unsupervised segmentation methods. Experimental results demonstrate the superiority of the proposed framework. Additionally, some case studies are analyzed to demonstrate the effectiveness of SLED.

* 10 pages, 8 figures, conference. Accepted by IEEE ICDM 2023 
Viaarxiv icon

One-shot Joint Extraction, Registration and Segmentation of Neuroimaging Data

Jul 27, 2023
Yao Su, Zhentian Qian, Lei Ma, Lifang He, Xiangnan Kong

Figure 1 for One-shot Joint Extraction, Registration and Segmentation of Neuroimaging Data
Figure 2 for One-shot Joint Extraction, Registration and Segmentation of Neuroimaging Data
Figure 3 for One-shot Joint Extraction, Registration and Segmentation of Neuroimaging Data
Figure 4 for One-shot Joint Extraction, Registration and Segmentation of Neuroimaging Data

Brain extraction, registration and segmentation are indispensable preprocessing steps in neuroimaging studies. The aim is to extract the brain from raw imaging scans (i.e., extraction step), align it with a target brain image (i.e., registration step) and label the anatomical brain regions (i.e., segmentation step). Conventional studies typically focus on developing separate methods for the extraction, registration and segmentation tasks in a supervised setting. The performance of these methods is largely contingent on the quantity of training samples and the extent of visual inspections carried out by experts for error correction. Nevertheless, collecting voxel-level labels and performing manual quality control on high-dimensional neuroimages (e.g., 3D MRI) are expensive and time-consuming in many medical studies. In this paper, we study the problem of one-shot joint extraction, registration and segmentation in neuroimaging data, which exploits only one labeled template image (a.k.a. atlas) and a few unlabeled raw images for training. We propose a unified end-to-end framework, called JERS, to jointly optimize the extraction, registration and segmentation tasks, allowing feedback among them. Specifically, we use a group of extraction, registration and segmentation modules to learn the extraction mask, transformation and segmentation mask, where modules are interconnected and mutually reinforced by self-supervision. Empirical results on real-world datasets demonstrate that our proposed method performs exceptionally in the extraction, registration and segmentation tasks. Our code and data can be found at https://github.com/Anonymous4545/JERS

* Published as a research track paper at KDD 2023. Code: https://github.com/Anonymous4545/JERS 
Viaarxiv icon

BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks

May 26, 2023
Kai Zhang, Jun Yu, Zhiling Yan, Yixin Liu, Eashan Adhikarla, Sunyang Fu, Xun Chen, Chen Chen, Yuyin Zhou, Xiang Li, Lifang He, Brian D. Davison, Quanzheng Li, Yong Chen, Hongfang Liu, Lichao Sun

Figure 1 for BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks
Figure 2 for BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks
Figure 3 for BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks
Figure 4 for BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks

In this paper, we introduce a unified and generalist Biomedical Generative Pre-trained Transformer (BiomedGPT) model, which leverages self-supervision on large and diverse datasets to accept multi-modal inputs and perform a range of downstream tasks. Our experiments demonstrate that BiomedGPT delivers expansive and inclusive representations of biomedical data, outperforming the majority of preceding state-of-the-art models across five distinct tasks with 20 public datasets spanning over 15 unique biomedical modalities. Through the ablation study, we also showcase the efficacy of our multi-modal and multi-task pretraining approach in transferring knowledge to previously unseen data. Overall, our work presents a significant step forward in developing unified and generalist models for biomedicine, with far-reaching implications for improving healthcare outcomes.

* work in progress 
Viaarxiv icon

Deep Multi-View Subspace Clustering with Anchor Graph

May 11, 2023
Chenhang Cui, Yazhou Ren, Jingyu Pu, Xiaorong Pu, Lifang He

Figure 1 for Deep Multi-View Subspace Clustering with Anchor Graph
Figure 2 for Deep Multi-View Subspace Clustering with Anchor Graph
Figure 3 for Deep Multi-View Subspace Clustering with Anchor Graph
Figure 4 for Deep Multi-View Subspace Clustering with Anchor Graph

Deep multi-view subspace clustering (DMVSC) has recently attracted increasing attention due to its promising performance. However, existing DMVSC methods still have two issues: (1) they mainly focus on using autoencoders to nonlinearly embed the data, while the embedding may be suboptimal for clustering because the clustering objective is rarely considered in autoencoders, and (2) existing methods typically have a quadratic or even cubic complexity, which makes it challenging to deal with large-scale data. To address these issues, in this paper we propose a novel deep multi-view subspace clustering method with anchor graph (DMCAG). To be specific, DMCAG firstly learns the embedded features for each view independently, which are used to obtain the subspace representations. To significantly reduce the complexity, we construct an anchor graph with small size for each view. Then, spectral clustering is performed on an integrated anchor graph to obtain pseudo-labels. To overcome the negative impact caused by suboptimal embedded features, we use pseudo-labels to refine the embedding process to make it more suitable for the clustering task. Pseudo-labels and embedded features are updated alternately. Furthermore, we design a strategy to keep the consistency of the labels based on contrastive learning to enhance the clustering performance. Empirical studies on real-world datasets show that our method achieves superior clustering performance over other state-of-the-art methods.

Viaarxiv icon

Hierarchical State Abstraction Based on Structural Information Principles

Apr 24, 2023
Xianghua Zeng, Hao Peng, Angsheng Li, Chunyang Liu, Lifang He, Philip S. Yu

Figure 1 for Hierarchical State Abstraction Based on Structural Information Principles
Figure 2 for Hierarchical State Abstraction Based on Structural Information Principles
Figure 3 for Hierarchical State Abstraction Based on Structural Information Principles
Figure 4 for Hierarchical State Abstraction Based on Structural Information Principles

State abstraction optimizes decision-making by ignoring irrelevant environmental information in reinforcement learning with rich observations. Nevertheless, recent approaches focus on adequate representational capacities resulting in essential information loss, affecting their performances on challenging tasks. In this article, we propose a novel mathematical Structural Information principles-based State Abstraction framework, namely SISA, from the information-theoretic perspective. Specifically, an unsupervised, adaptive hierarchical state clustering method without requiring manual assistance is presented, and meanwhile, an optimal encoding tree is generated. On each non-root tree node, a new aggregation function and condition structural entropy are designed to achieve hierarchical state abstraction and compensate for sampling-induced essential information loss in state abstraction. Empirical evaluations on a visual gridworld domain and six continuous control benchmarks demonstrate that, compared with five SOTA state abstraction approaches, SISA significantly improves mean episode reward and sample efficiency up to 18.98 and 44.44%, respectively. Besides, we experimentally show that SISA is a general framework that can be flexibly integrated with different representation-learning objectives to improve their performances further.

Viaarxiv icon

A Comparison of Image Denoising Methods

Apr 18, 2023
Zhaoming Kong, Fangxi Deng, Haomin Zhuang, Xiaowei Yang, Jun Yu, Lifang He

Figure 1 for A Comparison of Image Denoising Methods
Figure 2 for A Comparison of Image Denoising Methods
Figure 3 for A Comparison of Image Denoising Methods
Figure 4 for A Comparison of Image Denoising Methods

The advancement of imaging devices and countless images generated everyday pose an increasingly high demand on image denoising, which still remains a challenging task in terms of both effectiveness and efficiency. To improve denoising quality, numerous denoising techniques and approaches have been proposed in the past decades, including different transforms, regularization terms, algebraic representations and especially advanced deep neural network (DNN) architectures. Despite their sophistication, many methods may fail to achieve desirable results for simultaneous noise removal and fine detail preservation. In this paper, to investigate the applicability of existing denoising techniques, we compare a variety of denoising methods on both synthetic and real-world datasets for different applications. We also introduce a new dataset for benchmarking, and the evaluations are performed from four different perspectives including quantitative metrics, visual effects, human ratings and computational cost. Our experiments demonstrate: (i) the effectiveness and efficiency of representative traditional denoisers for various denoising tasks, (ii) a simple matrix-based algorithm may be able to produce similar results compared with its tensor counterparts, and (iii) the notable achievements of DNN models, which exhibit impressive generalization ability and show state-of-the-art performance on various datasets. In spite of the progress in recent years, we discuss shortcomings and possible extensions of existing techniques. Datasets, code and results are made publicly available and will be continuously updated at https://github.com/ZhaomingKong/Denoising-Comparison.

* In this paper, we intend to collect and compare various denoising methods to investigate their effectiveness, efficiency, applicability and generalization ability with both synthetic and real-world experiments 
Viaarxiv icon

Self-Paced Neutral Expression-Disentangled Learning for Facial Expression Recognition

Mar 21, 2023
Zhenqian Wu, Xiaoyuan Li, Yazhou Ren, Xiaorong Pu, Xiaofeng Zhu, Lifang He

Figure 1 for Self-Paced Neutral Expression-Disentangled Learning for Facial Expression Recognition
Figure 2 for Self-Paced Neutral Expression-Disentangled Learning for Facial Expression Recognition
Figure 3 for Self-Paced Neutral Expression-Disentangled Learning for Facial Expression Recognition
Figure 4 for Self-Paced Neutral Expression-Disentangled Learning for Facial Expression Recognition

The accuracy of facial expression recognition is typically affected by the following factors: high similarities across different expressions, disturbing factors, and micro-facial movement of rapid and subtle changes. One potentially viable solution for addressing these barriers is to exploit the neutral information concealed in neutral expression images. To this end, in this paper we propose a self-Paced Neutral Expression-Disentangled Learning (SPNDL) model. SPNDL disentangles neutral information from facial expressions, making it easier to extract key and deviation features. Specifically, it allows to capture discriminative information among similar expressions and perceive micro-facial movements. In order to better learn these neutral expression-disentangled features (NDFs) and to alleviate the non-convex optimization problem, a self-paced learning (SPL) strategy based on NDFs is proposed in the training stage. SPL learns samples from easy to complex by increasing the number of samples selected into the training process, which enables to effectively suppress the negative impacts introduced by low-quality samples and inconsistently distributed NDFs. Experiments on three popular databases (i.e., CK+, Oulu-CASIA, and RAF-DB) show the effectiveness of our proposed method.

Viaarxiv icon

A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

Feb 18, 2023
Ce Zhou, Qian Li, Chen Li, Jun Yu, Yixin Liu, Guangjing Wang, Kai Zhang, Cheng Ji, Qiben Yan, Lifang He, Hao Peng, Jianxin Li, Jia Wu, Ziwei Liu, Pengtao Xie, Caiming Xiong, Jian Pei, Philip S. Yu, Lichao Sun

Figure 1 for A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT
Figure 2 for A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT
Figure 3 for A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT
Figure 4 for A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

The Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A pretrained foundation model, such as BERT, GPT-3, MAE, DALLE-E, and ChatGPT, is trained on large-scale data which provides a reasonable parameter initialization for a wide range of downstream applications. The idea of pretraining behind PFMs plays an important role in the application of large models. Different from previous methods that apply convolution and recurrent modules for feature extractions, the generative pre-training (GPT) method applies Transformer as the feature extractor and is trained on large datasets with an autoregressive paradigm. Similarly, the BERT apples transformers to train on large datasets as a contextual language model. Recently, the ChatGPT shows promising success on large language models, which applies an autoregressive language model with zero shot or few show prompting. With the extraordinary success of PFMs, AI has made waves in a variety of fields over the past few years. Considerable methods, datasets, and evaluation metrics have been proposed in the literature, the need is raising for an updated survey. This study provides a comprehensive review of recent research advancements, current and future challenges, and opportunities for PFMs in text, image, graph, as well as other data modalities. We first review the basic components and existing pretraining in natural language processing, computer vision, and graph learning. We then discuss other advanced PFMs for other data modalities and unified PFMs considering the data quality and quantity. Besides, we discuss relevant research about the fundamentals of the PFM, including model efficiency and compression, security, and privacy. Finally, we lay out key implications, future research directions, challenges, and open problems.

* 97 pages, 16 figures 
Viaarxiv icon

ERNet: Unsupervised Collective Extraction and Registration in Neuroimaging Data

Dec 06, 2022
Yao Su, Zhentian Qian, Lifang He, Xiangnan Kong

Figure 1 for ERNet: Unsupervised Collective Extraction and Registration in Neuroimaging Data
Figure 2 for ERNet: Unsupervised Collective Extraction and Registration in Neuroimaging Data
Figure 3 for ERNet: Unsupervised Collective Extraction and Registration in Neuroimaging Data
Figure 4 for ERNet: Unsupervised Collective Extraction and Registration in Neuroimaging Data

Brain extraction and registration are important preprocessing steps in neuroimaging data analysis, where the goal is to extract the brain regions from MRI scans (i.e., extraction step) and align them with a target brain image (i.e., registration step). Conventional research mainly focuses on developing methods for the extraction and registration tasks separately under supervised settings. The performance of these methods highly depends on the amount of training samples and visual inspections performed by experts for error correction. However, in many medical studies, collecting voxel-level labels and conducting manual quality control in high-dimensional neuroimages (e.g., 3D MRI) are very expensive and time-consuming. Moreover, brain extraction and registration are highly related tasks in neuroimaging data and should be solved collectively. In this paper, we study the problem of unsupervised collective extraction and registration in neuroimaging data. We propose a unified end-to-end framework, called ERNet (Extraction-Registration Network), to jointly optimize the extraction and registration tasks, allowing feedback between them. Specifically, we use a pair of multi-stage extraction and registration modules to learn the extraction mask and transformation, where the extraction network improves the extraction accuracy incrementally and the registration network successively warps the extracted image until it is well-aligned with the target image. Experiment results on real-world datasets show that our proposed method can effectively improve the performance on extraction and registration tasks in neuroimaging data. Our code and data can be found at https://github.com/ERNetERNet/ERNet

* Published as a research track paper at KDD 2022. Code: https://github.com/ERNetERNet/ERNet 
Viaarxiv icon

ABN: Anti-Blur Neural Networks for Multi-Stage Deformable Image Registration

Dec 06, 2022
Yao Su, Xin Dai, Lifang He, Xiangnan Kong

Figure 1 for ABN: Anti-Blur Neural Networks for Multi-Stage Deformable Image Registration
Figure 2 for ABN: Anti-Blur Neural Networks for Multi-Stage Deformable Image Registration
Figure 3 for ABN: Anti-Blur Neural Networks for Multi-Stage Deformable Image Registration
Figure 4 for ABN: Anti-Blur Neural Networks for Multi-Stage Deformable Image Registration

Deformable image registration, i.e., the task of aligning multiple images into one coordinate system by non-linear transformation, serves as an essential preprocessing step for neuroimaging data. Recent research on deformable image registration is mainly focused on improving the registration accuracy using multi-stage alignment methods, where the source image is repeatedly deformed in stages by a same neural network until it is well-aligned with the target image. Conventional methods for multi-stage registration can often blur the source image as the pixel/voxel values are repeatedly interpolated from the image generated by the previous stage. However, maintaining image quality such as sharpness during image registration is crucial to medical data analysis. In this paper, we study the problem of anti-blur deformable image registration and propose a novel solution, called Anti-Blur Network (ABN), for multi-stage image registration. Specifically, we use a pair of short-term registration and long-term memory networks to learn the nonlinear deformations at each stage, where the short-term registration network learns how to improve the registration accuracy incrementally and the long-term memory network combines all the previous deformations to allow an interpolation to perform on the raw image directly and preserve image sharpness. Extensive experiments on both natural and medical image datasets demonstrated that ABN can accurately register images while preserving their sharpness. Our code and data can be found at https://github.com/anonymous3214/ABN

* Published as a full paper at ICDM 2022. Code: https://github.com/anonymous3214/ABN 
Viaarxiv icon