Alert button
Picture for Yutong Xie

Yutong Xie

Alert button

BHSD: A 3D Multi-Class Brain Hemorrhage Segmentation Dataset

Aug 23, 2023
Biao Wu, Yutong Xie, Zeyu Zhang, Jinchao Ge, Kaspar Yaxley, Suzan Bahadir, Qi Wu, Yifan Liu, Minh-Son To

Figure 1 for BHSD: A 3D Multi-Class Brain Hemorrhage Segmentation Dataset
Figure 2 for BHSD: A 3D Multi-Class Brain Hemorrhage Segmentation Dataset
Figure 3 for BHSD: A 3D Multi-Class Brain Hemorrhage Segmentation Dataset
Figure 4 for BHSD: A 3D Multi-Class Brain Hemorrhage Segmentation Dataset

Intracranial hemorrhage (ICH) is a pathological condition characterized by bleeding inside the skull or brain, which can be attributed to various factors. Identifying, localizing and quantifying ICH has important clinical implications, in a bleed-dependent manner. While deep learning techniques are widely used in medical image segmentation and have been applied to the ICH segmentation task, existing public ICH datasets do not support the multi-class segmentation problem. To address this, we develop the Brain Hemorrhage Segmentation Dataset (BHSD), which provides a 3D multi-class ICH dataset containing 192 volumes with pixel-level annotations and 2200 volumes with slice-level annotations across five categories of ICH. To demonstrate the utility of the dataset, we formulate a series of supervised and semi-supervised ICH segmentation tasks. We provide experimental results with state-of-the-art models as reference benchmarks for further model developments and evaluations on this dataset.

* Accepted by MLMI 2023 
Viaarxiv icon

Transformer-based Annotation Bias-aware Medical Image Segmentation

Jun 02, 2023
Zehui Liao, Yutong Xie, Shishuai Hu, Yong Xia

Figure 1 for Transformer-based Annotation Bias-aware Medical Image Segmentation
Figure 2 for Transformer-based Annotation Bias-aware Medical Image Segmentation
Figure 3 for Transformer-based Annotation Bias-aware Medical Image Segmentation
Figure 4 for Transformer-based Annotation Bias-aware Medical Image Segmentation

Manual medical image segmentation is subjective and suffers from annotator-related bias, which can be mimicked or amplified by deep learning methods. Recently, researchers have suggested that such bias is the combination of the annotator preference and stochastic error, which are modeled by convolution blocks located after decoder and pixel-wise independent Gaussian distribution, respectively. It is unlikely that convolution blocks can effectively model the varying degrees of preference at the full resolution level. Additionally, the independent pixel-wise Gaussian distribution disregards pixel correlations, leading to a discontinuous boundary. This paper proposes a Transformer-based Annotation Bias-aware (TAB) medical image segmentation model, which tackles the annotator-related bias via modeling annotator preference and stochastic errors. TAB employs the Transformer with learnable queries to extract the different preference-focused features. This enables TAB to produce segmentation with various preferences simultaneously using a single segmentation head. Moreover, TAB takes the multivariant normal distribution assumption that models pixel correlations, and learns the annotation distribution to disentangle the stochastic error. We evaluated our TAB on an OD/OC segmentation benchmark annotated by six annotators. Our results suggest that TAB outperforms existing medical image segmentation models which take into account the annotator-related bias.

* 11 pages, 2 figures 
Viaarxiv icon

Attention Mechanisms in Medical Image Segmentation: A Survey

May 29, 2023
Yutong Xie, Bing Yang, Qingbiao Guan, Jianpeng Zhang, Qi Wu, Yong Xia

Figure 1 for Attention Mechanisms in Medical Image Segmentation: A Survey
Figure 2 for Attention Mechanisms in Medical Image Segmentation: A Survey
Figure 3 for Attention Mechanisms in Medical Image Segmentation: A Survey
Figure 4 for Attention Mechanisms in Medical Image Segmentation: A Survey

Medical image segmentation plays an important role in computer-aided diagnosis. Attention mechanisms that distinguish important parts from irrelevant parts have been widely used in medical image segmentation tasks. This paper systematically reviews the basic principles of attention mechanisms and their applications in medical image segmentation. First, we review the basic concepts of attention mechanism and formulation. Second, we surveyed over 300 articles related to medical image segmentation, and divided them into two groups based on their attention mechanisms, non-Transformer attention and Transformer attention. In each group, we deeply analyze the attention mechanisms from three aspects based on the current literature work, i.e., the principle of the mechanism (what to use), implementation methods (how to use), and application tasks (where to use). We also thoroughly analyzed the advantages and limitations of their applications to different tasks. Finally, we summarize the current state of research and shortcomings in the field, and discuss the potential challenges in the future, including task specificity, robustness, standard evaluation, etc. We hope that this review can showcase the overall research context of traditional and Transformer attention methods, provide a clear reference for subsequent research, and inspire more advanced attention research, not only in medical image segmentation, but also in other image analysis scenarios.

* Submitted to Medical Image Analysis, survey paper, 34 pages, over 300 references 
Viaarxiv icon

S4M: Generating Radiology Reports by A Single Model for Multiple Body Parts

May 26, 2023
Qi Chen, Yutong Xie, Biao Wu, Minh-Son To, James Ang, Qi Wu

Figure 1 for S4M: Generating Radiology Reports by A Single Model for Multiple Body Parts
Figure 2 for S4M: Generating Radiology Reports by A Single Model for Multiple Body Parts
Figure 3 for S4M: Generating Radiology Reports by A Single Model for Multiple Body Parts
Figure 4 for S4M: Generating Radiology Reports by A Single Model for Multiple Body Parts

In this paper, we seek to design a report generation model that is able to generate reasonable reports even given different images of various body parts. We start by directly merging multiple datasets and training a single report generation model on this one. We, however, observe that the reports generated in such a simple way only obtain comparable performance compared with that trained separately on each specific dataset. We suspect that this is caused by the dilemma between the diversity of body parts and the limited availability of medical data. To develop robust and generalizable models, it is important to consider a diverse range of body parts and medical conditions. However, collecting a sufficiently large dataset for each specific body part can be difficult due to various factors, such as data availability and privacy concerns. Thus, rather than striving for more data, we propose a single-for-multiple (S4M) framework, which seeks to facilitate the learning of the report generation model with two auxiliary priors: an explicit prior (\ie, feeding radiology-informed knowledge) and an implicit prior (\ie, guided by cross-modal features). Specifically, based on the conventional encoder-decoder report generation framework, we incorporate two extra branches: a Radiology-informed Knowledge Aggregation (RadKA) branch and an Implicit Prior Guidance (IPG) branch. We conduct the experiments on our merged dataset which consists of a public dataset (\ie, IU-Xray) and five private datasets, covering six body parts: chest, abdomen, knee, hip, wrist and shoulder. Our S4M model outperforms all the baselines, regardless of whether they are trained on separate or merged datasets. Code is available at: \url{https://github.com/YtongXie/S4M}.

* 16 pages 
Viaarxiv icon

Unsupervised Image Denoising with Score Function

Apr 17, 2023
Yutong Xie, Mingze Yuan, Bin Dong, Quanzheng Li

Figure 1 for Unsupervised Image Denoising with Score Function
Figure 2 for Unsupervised Image Denoising with Score Function
Figure 3 for Unsupervised Image Denoising with Score Function
Figure 4 for Unsupervised Image Denoising with Score Function

Though achieving excellent performance in some cases, current unsupervised learning methods for single image denoising usually have constraints in applications. In this paper, we propose a new approach which is more general and applicable to complicated noise models. Utilizing the property of score function, the gradient of logarithmic probability, we define a solving system for denoising. Once the score function of noisy images has been estimated, the denoised result can be obtained through the solving system. Our approach can be applied to multiple noise models, such as the mixture of multiplicative and additive noise combined with structured correlation. Experimental results show that our method is comparable when the noise model is simple, and has good performance in complicated cases where other methods are not applicable or perform poorly.

Viaarxiv icon

UniSeg: A Prompt-driven Universal Segmentation Model as well as A Strong Representation Learner

Apr 07, 2023
Yiwen Ye, Yutong Xie, Jianpeng Zhang, Ziyang Chen, Yong Xia

Figure 1 for UniSeg: A Prompt-driven Universal Segmentation Model as well as A Strong Representation Learner
Figure 2 for UniSeg: A Prompt-driven Universal Segmentation Model as well as A Strong Representation Learner
Figure 3 for UniSeg: A Prompt-driven Universal Segmentation Model as well as A Strong Representation Learner
Figure 4 for UniSeg: A Prompt-driven Universal Segmentation Model as well as A Strong Representation Learner

The universal model emerges as a promising trend for medical image segmentation, paving up the way to build medical imaging large model (MILM). One popular strategy to build universal models is to encode each task as a one-hot vector and generate dynamic convolutional layers at the end of the decoder to extract the interested target. Although successful, it ignores the correlations among tasks and meanwhile is too late to make the model 'aware' of the ongoing task. To address both issues, we propose a prompt-driven Universal Segmentation model (UniSeg) for multi-task medical image segmentation using diverse modalities and domains. We first devise a learnable universal prompt to describe the correlations among all tasks and then convert this prompt and image features into a task-specific prompt, which is fed to the decoder as a part of its input. Thus, we make the model 'aware' of the ongoing task early and boost the task-specific training of the whole decoder. Our results indicate that the proposed UniSeg outperforms other universal models and single-task models on 11 upstream tasks. Moreover, UniSeg also beats other pre-trained models on two downstream datasets, providing the community with a high-quality pre-trained model for 3D medical image segmentation. Code and model are available at https://github.com/yeerwen/UniSeg.

* 13 pages, 4 figures 
Viaarxiv icon

A Prompt Log Analysis of Text-to-Image Generation Systems

Mar 16, 2023
Yutong Xie, Zhaoying Pan, Jinge Ma, Luo Jie, Qiaozhu Mei

Figure 1 for A Prompt Log Analysis of Text-to-Image Generation Systems
Figure 2 for A Prompt Log Analysis of Text-to-Image Generation Systems
Figure 3 for A Prompt Log Analysis of Text-to-Image Generation Systems
Figure 4 for A Prompt Log Analysis of Text-to-Image Generation Systems

Recent developments in large language models (LLM) and generative AI have unleashed the astonishing capabilities of text-to-image generation systems to synthesize high-quality images that are faithful to a given reference text, known as a "prompt". These systems have immediately received lots of attention from researchers, creators, and common users. Despite the plenty of efforts to improve the generative models, there is limited work on understanding the information needs of the users of these systems at scale. We conduct the first comprehensive analysis of large-scale prompt logs collected from multiple text-to-image generation systems. Our work is analogous to analyzing the query logs of Web search engines, a line of work that has made critical contributions to the glory of the Web search industry and research. Compared with Web search queries, text-to-image prompts are significantly longer, often organized into special structures that consist of the subject, form, and intent of the generation tasks and present unique categories of information needs. Users make more edits within creation sessions, which present remarkable exploratory patterns. There is also a considerable gap between the user-input prompts and the captions of the images included in the open training data of the generative models. Our findings provide concrete implications on how to improve text-to-image generation systems for creation purposes.

Viaarxiv icon

Diffusion Model for Generative Image Denoising

Feb 05, 2023
Yutong Xie, Minne Yuan, Bin Dong, Quanzheng Li

Figure 1 for Diffusion Model for Generative Image Denoising
Figure 2 for Diffusion Model for Generative Image Denoising
Figure 3 for Diffusion Model for Generative Image Denoising
Figure 4 for Diffusion Model for Generative Image Denoising

In supervised learning for image denoising, usually the paired clean images and noisy images are collected or synthesised to train a denoising model. L2 norm loss or other distance functions are used as the objective function for training. It often leads to an over-smooth result with less image details. In this paper, we regard the denoising task as a problem of estimating the posterior distribution of clean images conditioned on noisy images. We apply the idea of diffusion model to realize generative image denoising. According to the noise model in denoising tasks, we redefine the diffusion process such that it is different from the original one. Hence, the sampling of the posterior distribution is a reverse process of dozens of steps from the noisy image. We consider three types of noise model, Gaussian, Gamma and Poisson noise. With the guarantee of theory, we derive a unified strategy for model training. Our method is verified through experiments on three types of noise models and achieves excellent performance.

Viaarxiv icon

Instance-specific Label Distribution Regularization for Learning with Label Noise

Dec 16, 2022
Zehui Liao, Shishuai Hu, Yutong Xie, Yong Xia

Figure 1 for Instance-specific Label Distribution Regularization for Learning with Label Noise
Figure 2 for Instance-specific Label Distribution Regularization for Learning with Label Noise
Figure 3 for Instance-specific Label Distribution Regularization for Learning with Label Noise
Figure 4 for Instance-specific Label Distribution Regularization for Learning with Label Noise

Modeling noise transition matrix is a kind of promising method for learning with label noise. Based on the estimated noise transition matrix and the noisy posterior probabilities, the clean posterior probabilities, which are jointly called Label Distribution (LD) in this paper, can be calculated as the supervision. To reliably estimate the noise transition matrix, some methods assume that anchor points are available during training. Nonetheless, if anchor points are invalid, the noise transition matrix might be poorly learned, resulting in poor performance. Consequently, other methods treat reliable data points, extracted from training data, as pseudo anchor points. However, from a statistical point of view, the noise transition matrix can be inferred from data with noisy labels under the clean-label-domination assumption. Therefore, we aim to estimate the noise transition matrix without (pseudo) anchor points. There is evidence showing that samples are more likely to be mislabeled as other similar class labels, which means the mislabeling probability is highly correlated with the inter-class correlation. Inspired by this observation, we propose an instance-specific Label Distribution Regularization (LDR), in which the instance-specific LD is estimated as the supervision, to prevent DCNNs from memorizing noisy labels. Specifically, we estimate the noisy posterior under the supervision of noisy labels, and approximate the batch-level noise transition matrix by estimating the inter-class correlation matrix with neither anchor points nor pseudo anchor points. Experimental results on two synthetic noisy datasets and two real-world noisy datasets demonstrate that our LDR outperforms existing methods.

Viaarxiv icon