The widespread application of artificial intelligence in health research is currently hampered by limitations in data availability. Distributed learning methods such as federated learning (FL) and shared learning (SL) are introduced to solve this problem as well as data management and ownership issues with their different strengths and weaknesses. The recent proposal of federated split task-agnostic (FeSTA) learning tries to reconcile the distinct merits of FL and SL by enabling the multi-task collaboration between participants through Vision Transformer (ViT) architecture, but they suffer from higher communication overhead. To address this, here we present a multi-task distributed learning using ViT with random patch permutation. Instead of using a CNN based head as in FeSTA, p-FeSTA adopts a randomly permuting simple patch embedder, improving the multi-task learning performance without sacrificing privacy. Experimental results confirm that the proposed method significantly enhances the benefit of multi-task collaboration, communication efficiency, and privacy preservation, shedding light on practical multi-task distributed learning in the field of medical imaging.
Patient scans from MRI often suffer from noise, which hampers the diagnostic capability of such images. As a method to mitigate such artifact, denoising is largely studied both within the medical imaging community and beyond the community as a general subject. However, recent deep neural network-based approaches mostly rely on the minimum mean squared error (MMSE) estimates, which tend to produce a blurred output. Moreover, such models suffer when deployed in real-world sitautions: out-of-distribution data, and complex noise distributions that deviate from the usual parametric noise models. In this work, we propose a new denoising method based on score-based reverse diffusion sampling, which overcomes all the aforementioned drawbacks. Our network, trained only with coronal knee scans, excels even on out-of-distribution in vivo liver MRI data, contaminated with complex mixture of noise. Even more, we propose a method to enhance the resolution of the denoised image with the same network. With extensive experiments, we show that our method establishes state-of-the-art performance, while having desirable properties which prior MMSE denoisers did not have: flexibly choosing the extent of denoising, and quantifying uncertainty.
There are many recent research efforts to fine-tune a pre-trained generator with a few target images to generate images of a novel domain. Unfortunately, these methods often suffer from overfitting or under-fitting when fine-tuned with a single target image. To address this, here we present a novel single-shot GAN adaptation method through unified CLIP space manipulations. Specifically, our model employs a two-step training strategy: reference image search in the source generator using a CLIP-guided latent optimization, followed by generator fine-tuning with a novel loss function that imposes CLIP space consistency between the source and adapted generators. To further improve the adapted model to produce spatially consistent samples with respect to the source generator, we also propose contrastive regularization for patchwise relationships in the CLIP space. Experimental results show that our model generates diverse outputs with the target texture and outperforms the baseline models both qualitatively and quantitatively. Furthermore, we show that our CLIP space manipulation strategy allows more effective attribute editing.
Recently, contrastive learning-based image translation methods have been proposed, which contrasts different spatial locations to enhance the spatial correspondence. However, the methods often ignore the diverse semantic relation within the images. To address this, here we propose a novel semantic relation consistency (SRC) regularization along with the decoupled contrastive learning, which utilize the diverse semantics by focusing on the heterogeneous semantics between the image patches of a single image. To further improve the performance, we present a hard negative mining by exploiting the semantic relation. We verified our method for three tasks: single-modal and multi-modal image translations, and GAN compression task for image translation. Experimental results confirmed the state-of-art performance of our method in all the three tasks.
Gastric endoscopic screening is an effective way to decide appropriate gastric cancer (GC) treatment at an early stage, reducing GC-associated mortality rate. Although artificial intelligence (AI) has brought a great promise to assist pathologist to screen digitalized whole slide images, automatic classification systems for guiding proper GC treatment based on clinical guideline are still lacking. Here, we propose an AI system classifying 5 classes of GC histology, which can be perfectly matched to general treatment guidance. The AI system, mimicking the way pathologist understand slides through multi-scale self-attention mechanism using a 2-stage Vision Transformer, demonstrates clinical capability by achieving diagnostic sensitivity of above 85% for both internal and external cohort analysis. Furthermore, AI-assisted pathologists showed significantly improved diagnostic sensitivity by 10% within 18% saved screening time compared to human pathologists. Our AI system has a great potential for providing presumptive pathologic opinion for deciding proper treatment for early GC patients.
Ultrasound (US) is widely used for clinical imaging applications thanks to its real-time and non-invasive nature. However, its lesion detectability is often limited in many applications due to the phase aberration artefact caused by variations in the speed of sound (SoS) within body parts. To address this, here we propose a novel self-supervised 3D CNN that enables phase aberration robust plane-wave imaging. Instead of aiming at estimating the SoS distribution as in conventional methods, our approach is unique in that the network is trained in a self-supervised manner to robustly generate a high-quality image from various phase aberrated images by modeling the variation in the speed of sound as stochastic. Experimental results using real measurements from tissue-mimicking phantom and \textit{in vivo} scans confirmed that the proposed method can significantly reduce the phase aberration artifacts and improve the visual quality of deep scans.
Although deep learning-based computer-aided diagnosis systems have recently achieved expert-level performance, developing a robust deep learning model requires large, high-quality data with manual annotation, which is expensive to obtain. This situation poses the problem that the chest x-rays collected annually in hospitals cannot be used due to the lack of manual labeling by experts, especially in deprived areas. To address this, here we present a novel deep learning framework that uses knowledge distillation through self-supervised learning and self-training, which shows that the performance of the original model trained with a small number of labels can be gradually improved with more unlabeled data. Experimental results show that the proposed framework maintains impressive robustness against a real-world environment and has general applicability to several diagnostic tasks such as tuberculosis, pneumothorax, and COVID-19. Notably, we demonstrated that our model performs even better than those trained with the same amount of labeled data. The proposed framework has a great potential for medical imaging, where plenty of data is accumulated every year, but ground truth annotations are expensive to obtain.
Understanding implicit bias of gradient descent has been an important goal in machine learning research. Unfortunately, even for a single-neuron ReLU network, it recently proved impossible to characterize the implicit regularization with the square loss by an explicit function of the norm of model parameters. In order to close the gap between the existing theory and the intriguing empirical behavior of ReLU networks, here we examine the gradient flow dynamics in the parameter space when training single-neuron ReLU networks. Specifically, we discover implicit bias in terms of support vectors in ReLU networks, which play a key role in why and how ReLU networks generalize well. Moreover, we analyze gradient flows with respect to the magnitude of the norm of initialization, and show the impact of the norm in gradient dynamics. Lastly, under some conditions, we prove that the norm of the learned weight strictly increases on the gradient flow.
Contrastive learning is a method of learning visual representations by training Deep Neural Networks (DNNs) to increase the similarity between representations of positive pairs and reduce the similarity between representations of negative pairs. However, contrastive methods usually require large datasets with significant number of negative pairs per iteration to achieve reasonable performance on downstream tasks. To address this problem, here we propose Energy-Based Contrastive Learning (EBCLR) that combines contrastive learning with Energy-Based Models (EBMs) and can be theoretically interpreted as learning the joint distribution of positive pairs. Using a novel variant of Stochastic Gradient Langevin Dynamics (SGLD) to accelerate the training of EBCLR, we show that EBCLR is far more sample-efficient than previous self-supervised learning methods. Specifically, EBCLR shows from X4 up to X20 acceleration compared to SimCLR and MoCo v2 in terms of training epochs. Furthermore, in contrast to SimCLR, EBCLR achieves nearly the same performance with 254 negative pairs (batch size 128) and 30 negative pairs (batch size 16) per positive pair, demonstrating the robustness of EBCLR to small number of negative pairs.
Deformable image registration is one of the fundamental tasks for medical imaging and computer vision. Classical registration algorithms usually rely on iterative optimization approaches to provide accurate deformation, which requires high computational cost. Although many deep-learning-based methods have been developed to carry out fast image registration, it is still challenging to estimate the deformation field with less topological folding problem. Furthermore, these approaches only enable registration to a single fixed image, and it is not possible to obtain continuously varying registration results between the moving and fixed images. To address this, here we present a novel approach of diffusion model-based probabilistic image registration, called DiffuseMorph. Specifically, our model learns the score function of the deformation between moving and fixed images. Similar to the existing diffusion models, DiffuseMorph not only provides synthetic deformed images through a reverse diffusion process, but also enables various levels of deformation of the moving image along with the latent space. Experimental results on 2D face expression image and 3D brain image registration tasks demonstrate that our method can provide flexible and accurate deformation with a capability of topology preservation.