Alert button
Picture for Hyungyung Lee

Hyungyung Lee

Alert button

Unified Chest X-ray and Radiology Report Generation Model with Multi-view Chest X-rays

Mar 01, 2023
Hyungyung Lee, Da Young Lee, Wonjae Kim, Jin-Hwa Kim, Tackeun Kim, Jihang Kim, Leonard Sunwoo, Edward Choi

Figure 1 for Unified Chest X-ray and Radiology Report Generation Model with Multi-view Chest X-rays
Figure 2 for Unified Chest X-ray and Radiology Report Generation Model with Multi-view Chest X-rays
Figure 3 for Unified Chest X-ray and Radiology Report Generation Model with Multi-view Chest X-rays
Figure 4 for Unified Chest X-ray and Radiology Report Generation Model with Multi-view Chest X-rays

Generated synthetic data in medical research can substitute privacy and security-sensitive data with a large-scale curated dataset, reducing data collection and annotation costs. As part of this effort, we propose UniXGen, a unified chest X-ray and report generation model, with the following contributions. First, we design a unified model for bidirectional chest X-ray and report generation by adopting a vector quantization method to discretize chest X-rays into discrete visual tokens and formulating both tasks as sequence generation tasks. Second, we introduce several special tokens to generate chest X-rays with specific views that can be useful when the desired views are unavailable. Furthermore, UniXGen can flexibly take various inputs from single to multiple views to take advantage of the additional findings available in other X-ray views. We adopt an efficient transformer for computational and memory efficiency to handle the long-range input sequence of multi-view chest X-rays with high resolution and long paragraph reports. In extensive experiments, we show that our unified model has a synergistic effect on both generation tasks, as opposed to training only the task-specific models. We also find that view-specific special tokens can distinguish between different views and properly generate specific views even if they do not exist in the dataset, and utilizing multi-view chest X-rays can faithfully capture the abnormal findings in the additional X-rays. The source code is publicly available at: https://github.com/ttumyche/UniXGen.

Viaarxiv icon

Unconditional Image-Text Pair Generation with Multimodal Cross Quantizer

Apr 15, 2022
Hyungyung Lee, Sungjin Park, Edward Choi

Figure 1 for Unconditional Image-Text Pair Generation with Multimodal Cross Quantizer
Figure 2 for Unconditional Image-Text Pair Generation with Multimodal Cross Quantizer
Figure 3 for Unconditional Image-Text Pair Generation with Multimodal Cross Quantizer
Figure 4 for Unconditional Image-Text Pair Generation with Multimodal Cross Quantizer

Though deep generative models have gained a lot of attention, most of the existing works are designed for the unimodal generation task. In this paper, we explore a new method for unconditional image-text pair generation. We propose MXQ-VAE, a vector quantization method for multimodal image-text representation. MXQ-VAE accepts a paired image and text as input, and learns a joint quantized representation space, so that the image-text pair can be converted to a sequence of unified indices. Then we can use autoregressive generative models to model the joint image-text representation, and even perform unconditional image-text pair generation. Extensive experimental results demonstrate that our approach effectively generates semantically consistent image-text pair and also enhances meaningful alignment between image and text.

* ICLR 2022 workshop on Deep Generative Models for Highly Structured Data 
Viaarxiv icon

Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training

May 24, 2021
Jong Hak Moon, Hyungyung Lee, Woncheol Shin, Edward Choi

Figure 1 for Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training
Figure 2 for Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training
Figure 3 for Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training
Figure 4 for Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training

Recently a number of studies demonstrated impressive performance on diverse vision-language multi-modal tasks such as image captioning and visual question answering by extending the BERT architecture with multi-modal pre-training objectives. In this work we explore a broad set of multi-modal representation learning tasks in the medical domain, specifically using radiology images and the unstructured report. We propose Medical Vision Language Learner (MedViLL) which adopts a Transformer-based architecture combined with a novel multimodal attention masking scheme to maximize generalization performance for both vision-language understanding tasks (image-report retrieval, disease classification, medical visual question answering) and vision-language generation task (report generation). By rigorously evaluating the proposed model on four downstream tasks with two chest X-ray image datasets (MIMIC-CXR and Open-I), we empirically demonstrate the superior downstream task performance of MedViLL against various baselines including task-specific architectures.

* v1: Main paper + supplementary material (15 pages, 5 figures, 6 tables) 
Viaarxiv icon