Alert button
Picture for Zongyuan Ge

Zongyuan Ge

Alert button

Ugly Ducklings or Swans: A Tiered Quadruplet Network with Patient-Specific Mining for Improved Skin Lesion Classification

Sep 18, 2023
Nathasha Naranpanawa, H. Peter Soyer, Adam Mothershaw, Gayan K. Kulatilleke, Zongyuan Ge, Brigid Betz-Stablein, Shekhar S. Chandra

An ugly duckling is an obviously different skin lesion from surrounding lesions of an individual, and the ugly duckling sign is a criterion used to aid in the diagnosis of cutaneous melanoma by differentiating between highly suspicious and benign lesions. However, the appearance of pigmented lesions, can change drastically from one patient to another, resulting in difficulties in visual separation of ugly ducklings. Hence, we propose DMT-Quadruplet - a deep metric learning network to learn lesion features at two tiers - patient-level and lesion-level. We introduce a patient-specific quadruplet mining approach together with a tiered quadruplet network, to drive the network to learn more contextual information both globally and locally between the two tiers. We further incorporate a dynamic margin within the patient-specific mining to allow more useful quadruplets to be mined within individuals. Comprehensive experiments show that our proposed method outperforms traditional classifiers, achieving 54% higher sensitivity than a baseline ResNet18 CNN and 37% higher than a naive triplet network in classifying ugly duckling lesions. Visualisation of the data manifold in the metric space further illustrates that DMT-Quadruplet is capable of classifying ugly duckling lesions in both patient-specific and patient-agnostic manner successfully.

* 12 pages, 6 figures 
Viaarxiv icon

Privacy-preserving Early Detection of Epileptic Seizures in Videos

Sep 15, 2023
Deval Mehta, Shobi Sivathamboo, Hugh Simpson, Patrick Kwan, Terence O`Brien, Zongyuan Ge

In this work, we contribute towards the development of video-based epileptic seizure classification by introducing a novel framework (SETR-PKD), which could achieve privacy-preserved early detection of seizures in videos. Specifically, our framework has two significant components - (1) It is built upon optical flow features extracted from the video of a seizure, which encodes the seizure motion semiotics while preserving the privacy of the patient; (2) It utilizes a transformer based progressive knowledge distillation, where the knowledge is gradually distilled from networks trained on a longer portion of video samples to the ones which will operate on shorter portions. Thus, our proposed framework addresses the limitations of the current approaches which compromise the privacy of the patients by directly operating on the RGB video of a seizure as well as impede real-time detection of a seizure by utilizing the full video sample to make a prediction. Our SETR-PKD framework could detect tonic-clonic seizures (TCSs) in a privacy-preserving manner with an accuracy of 83.9% while they are only half-way into their progression. Our data and code is available at https://github.com/DevD1092/seizure-detection

* Accepted to MICCAI 2023 
Viaarxiv icon

EndoSurf: Neural Surface Reconstruction of Deformable Tissues with Stereo Endoscope Videos

Jul 21, 2023
Ruyi Zha, Xuelian Cheng, Hongdong Li, Mehrtash Harandi, Zongyuan Ge

Figure 1 for EndoSurf: Neural Surface Reconstruction of Deformable Tissues with Stereo Endoscope Videos
Figure 2 for EndoSurf: Neural Surface Reconstruction of Deformable Tissues with Stereo Endoscope Videos
Figure 3 for EndoSurf: Neural Surface Reconstruction of Deformable Tissues with Stereo Endoscope Videos
Figure 4 for EndoSurf: Neural Surface Reconstruction of Deformable Tissues with Stereo Endoscope Videos

Reconstructing soft tissues from stereo endoscope videos is an essential prerequisite for many medical applications. Previous methods struggle to produce high-quality geometry and appearance due to their inadequate representations of 3D scenes. To address this issue, we propose a novel neural-field-based method, called EndoSurf, which effectively learns to represent a deforming surface from an RGBD sequence. In EndoSurf, we model surface dynamics, shape, and texture with three neural fields. First, 3D points are transformed from the observed space to the canonical space using the deformation field. The signed distance function (SDF) field and radiance field then predict their SDFs and colors, respectively, with which RGBD images can be synthesized via differentiable volume rendering. We constrain the learned shape by tailoring multiple regularization strategies and disentangling geometry and appearance. Experiments on public endoscope datasets demonstrate that EndoSurf significantly outperforms existing solutions, particularly in reconstructing high-fidelity shapes. Code is available at https://github.com/Ruyi-Zha/endosurf.git.

* MICCAI 2023 (Early Accept); Ruyi Zha and Xuelian Cheng made equal contributions. Corresponding author: Ruyi Zha (ruyi.zha@gmail.com) 
Viaarxiv icon

LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition

May 08, 2023
Peng Xia, Di Xu, Lie Ju, Ming Hu, Jun Chen, Zongyuan Ge

Figure 1 for LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition
Figure 2 for LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition
Figure 3 for LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition
Figure 4 for LMPT: Prompt Tuning with Class-Specific Embedding Loss for Long-tailed Multi-Label Visual Recognition

Long-tailed multi-label visual recognition (LTML) task is a highly challenging task due to the label co-occurrence and imbalanced data distribution. In this work, we propose a unified framework for LTML, namely prompt tuning with class-specific embedding loss (LMPT), capturing the semantic feature interactions between categories by combining text and image modality data and improving the performance synchronously on both head and tail classes. Specifically, LMPT introduces the embedding loss function with class-aware soft margin and re-weighting to learn class-specific contexts with the benefit of textual descriptions (captions), which could help establish semantic relationships between classes, especially between the head and tail classes. Furthermore, taking into account the class imbalance, the distribution-balanced loss is adopted as the classification loss function to further improve the performance on the tail classes without compromising head classes. Extensive experiments are conducted on VOC-LT and COCO-LT datasets, which demonstrates that the proposed method significantly surpasses the previous state-of-the-art methods and zero-shot CLIP in LTML. Our codes are fully available at \url{https://github.com/richard-peng-xia/LMPT}.

Viaarxiv icon

TPMIL: Trainable Prototype Enhanced Multiple Instance Learning for Whole Slide Image Classification

May 01, 2023
Litao Yang, Deval Mehta, Sidong Liu, Dwarikanath Mahapatra, Antonio Di Ieva, Zongyuan Ge

Figure 1 for TPMIL: Trainable Prototype Enhanced Multiple Instance Learning for Whole Slide Image Classification
Figure 2 for TPMIL: Trainable Prototype Enhanced Multiple Instance Learning for Whole Slide Image Classification
Figure 3 for TPMIL: Trainable Prototype Enhanced Multiple Instance Learning for Whole Slide Image Classification
Figure 4 for TPMIL: Trainable Prototype Enhanced Multiple Instance Learning for Whole Slide Image Classification

Digital pathology based on whole slide images (WSIs) plays a key role in cancer diagnosis and clinical practice. Due to the high resolution of the WSI and the unavailability of patch-level annotations, WSI classification is usually formulated as a weakly supervised problem, which relies on multiple instance learning (MIL) based on patches of a WSI. In this paper, we aim to learn an optimal patch-level feature space by integrating prototype learning with MIL. To this end, we develop a Trainable Prototype enhanced deep MIL (TPMIL) framework for weakly supervised WSI classification. In contrast to the conventional methods which rely on a certain number of selected patches for feature space refinement, we softly cluster all the instances by allocating them to their corresponding prototypes. Additionally, our method is able to reveal the correlations between different tumor subtypes through distances between corresponding trained prototypes. More importantly, TPMIL also enables to provide a more accurate interpretability based on the distance of the instances from the trained prototypes which serves as an alternative to the conventional attention score-based interpretability. We test our method on two WSI datasets and it achieves a new SOTA. GitHub repository: https://github.com/LitaoYang-Jet/TPMIL

* Accepted for MIDL 2023 
Viaarxiv icon

EPVT: Environment-aware Prompt Vision Transformer for Domain Generalization in Skin Lesion Recognition

Apr 09, 2023
Siyuan Yan, Chi Liu, Zhen Yu, Lie Ju, Dwarikanath Mahapatrainst, Victoria Mar, Monika Janda, Peter Soyer, Zongyuan Ge

Figure 1 for EPVT: Environment-aware Prompt Vision Transformer for Domain Generalization in Skin Lesion Recognition
Figure 2 for EPVT: Environment-aware Prompt Vision Transformer for Domain Generalization in Skin Lesion Recognition
Figure 3 for EPVT: Environment-aware Prompt Vision Transformer for Domain Generalization in Skin Lesion Recognition
Figure 4 for EPVT: Environment-aware Prompt Vision Transformer for Domain Generalization in Skin Lesion Recognition

Skin lesion recognition using deep learning has made remarkable progress, and there is an increasing need for deploying these systems in real-world scenarios. However, recent research has revealed that deep neural networks for skin lesion recognition may overly depend on disease-irrelevant image artifacts (i.e. dark corners, dense hairs), leading to poor generalization in unseen environments. To address this issue, we propose a novel domain generalization method called EPVT, which involves embedding prompts into the vision transformer to collaboratively learn knowledge from diverse domains. Concretely, EPVT leverages a set of domain prompts, each of which plays as a domain expert, to capture domain-specific knowledge; and a shared prompt for general knowledge over the entire dataset. To facilitate knowledge sharing and the interaction of different prompts, we introduce a domain prompt generator that enables low-rank multiplicative updates between domain prompts and the shared prompt. A domain mixup strategy is additionally devised to reduce the co-occurring artifacts in each domain, which allows for more flexible decision margins and mitigates the issue of incorrectly assigned domain labels. Experiments on four out-of-distribution datasets and six different biased ISIC datasets demonstrate the superior generalization ability of EPVT in skin lesion recognition across various environments. Our code and dataset will be released at https://github.com/SiyuanYan1/EPVT.

* 12 pages, 5 figures 
Viaarxiv icon

Towards Open-Scenario Semi-supervised Medical Image Classification

Apr 08, 2023
Lie Ju, Yicheng Wu, Wei Feng, Zhen Yu, Lin Wang, Zhuoting Zhu, Zongyuan Ge

Figure 1 for Towards Open-Scenario Semi-supervised Medical Image Classification
Figure 2 for Towards Open-Scenario Semi-supervised Medical Image Classification
Figure 3 for Towards Open-Scenario Semi-supervised Medical Image Classification
Figure 4 for Towards Open-Scenario Semi-supervised Medical Image Classification

Semi-supervised learning (SSL) has attracted much attention since it reduces the expensive costs of collecting adequate well-labeled training data, especially for deep learning methods. However, traditional SSL is built upon an assumption that labeled and unlabeled data should be from the same distribution e.g., classes and domains. However, in practical scenarios, unlabeled data would be from unseen classes or unseen domains, and it is still challenging to exploit them by existing SSL methods. Therefore, in this paper, we proposed a unified framework to leverage these unseen unlabeled data for open-scenario semi-supervised medical image classification. We first design a novel scoring mechanism, called dual-path outliers estimation, to identify samples from unseen classes. Meanwhile, to extract unseen-domain samples, we then apply an effective variational autoencoder (VAE) pre-training. After that, we conduct domain adaptation to fully exploit the value of the detected unseen-domain samples to boost semi-supervised training. We evaluated our proposed framework on dermatology and ophthalmology tasks. Extensive experiments demonstrate our model can achieve superior classification performance in various medical SSL scenarios.

Viaarxiv icon

Towards Trustable Skin Cancer Diagnosis via Rewriting Model's Decision

Mar 02, 2023
Siyuan Yan, Zhen Yu, Xuelin Zhang, Dwarikanath Mahapatra, Shekhar S. Chandra, Monika Janda, Peter Soyer, Zongyuan Ge

Figure 1 for Towards Trustable Skin Cancer Diagnosis via Rewriting Model's Decision
Figure 2 for Towards Trustable Skin Cancer Diagnosis via Rewriting Model's Decision
Figure 3 for Towards Trustable Skin Cancer Diagnosis via Rewriting Model's Decision
Figure 4 for Towards Trustable Skin Cancer Diagnosis via Rewriting Model's Decision

Deep neural networks have demonstrated promising performance on image recognition tasks. However, they may heavily rely on confounding factors, using irrelevant artifacts or bias within the dataset as the cue to improve performance. When a model performs decision-making based on these spurious correlations, it can become untrustable and lead to catastrophic outcomes when deployed in the real-world scene. In this paper, we explore and try to solve this problem in the context of skin cancer diagnosis. We introduce a human-in-the-loop framework in the model training process such that users can observe and correct the model's decision logic when confounding behaviors happen. Specifically, our method can automatically discover confounding factors by analyzing the co-occurrence behavior of the samples. It is capable of learning confounding concepts using easily obtained concept exemplars. By mapping the black-box model's feature representation onto an explainable concept space, human users can interpret the concept and intervene via first order-logic instruction. We systematically evaluate our method on our newly crafted, well-controlled skin lesion dataset and several public skin lesion datasets. Experiments show that our method can effectively detect and remove confounding factors from datasets without any prior knowledge about the category distribution and does not require fully annotated concept labels. We also show that our method enables the model to focus on clinical-related concepts, improving the model's performance and trustworthiness during model inference.

* Accepted by CVPR 2023 
Viaarxiv icon

Multimorbidity Content-Based Medical Image Retrieval Using Proxies

Nov 22, 2022
Yunyan Xing, Benjamin J. Meyer, Mehrtash Harandi, Tom Drummond, Zongyuan Ge

Figure 1 for Multimorbidity Content-Based Medical Image Retrieval Using Proxies
Figure 2 for Multimorbidity Content-Based Medical Image Retrieval Using Proxies
Figure 3 for Multimorbidity Content-Based Medical Image Retrieval Using Proxies
Figure 4 for Multimorbidity Content-Based Medical Image Retrieval Using Proxies

Content-based medical image retrieval is an important diagnostic tool that improves the explainability of computer-aided diagnosis systems and provides decision making support to healthcare professionals. Medical imaging data, such as radiology images, are often multimorbidity; a single sample may have more than one pathology present. As such, image retrieval systems for the medical domain must be designed for the multi-label scenario. In this paper, we propose a novel multi-label metric learning method that can be used for both classification and content-based image retrieval. In this way, our model is able to support diagnosis by predicting the presence of diseases and provide evidence for these predictions by returning samples with similar pathological content to the user. In practice, the retrieved images may also be accompanied by pathology reports, further assisting in the diagnostic process. Our method leverages proxy feature vectors, enabling the efficient learning of a robust feature space in which the distance between feature vectors can be used as a measure of the similarity of those samples. Unlike existing proxy-based methods, training samples are able to assign to multiple proxies that span multiple class labels. This multi-label proxy assignment results in a feature space that encodes the complex relationships between diseases present in medical imaging data. Our method outperforms state-of-the-art image retrieval systems and a set of baseline approaches. We demonstrate the efficacy of our approach to both classification and content-based image retrieval on two multimorbidity radiology datasets.

Viaarxiv icon