Alert button
Picture for Ziheng Zhao

Ziheng Zhao

Alert button

PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

May 24, 2023
Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Weixiong Lin, Ya Zhang, Yanfeng Wang, Weidi Xie

Figure 1 for PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
Figure 2 for PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
Figure 3 for PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
Figure 4 for PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

In this paper, we focus on the problem of Medical Visual Question Answering (MedVQA), which is crucial in efficiently interpreting medical images with vital clinic-relevant information. Firstly, we reframe the problem of MedVQA as a generation task that naturally follows the human-machine interaction, we propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model. Secondly, we establish a scalable pipeline to construct a large-scale medical visual question-answering dataset, named PMC-VQA, which contains 227k VQA pairs of 149k images that cover various modalities or diseases. Thirdly, we pre-train our proposed model on PMC-VQA and then fine-tune it on multiple public benchmarks, e.g., VQA-RAD and SLAKE, outperforming existing work by a large margin. Additionally, we propose a test set that has undergone manual verification, which is significantly more challenging, even the best models struggle to solve.

Viaarxiv icon

PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents

Mar 13, 2023
Weixiong Lin, Ziheng Zhao, Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Yanfeng Wang, Weidi Xie

Figure 1 for PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents
Figure 2 for PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents
Figure 3 for PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents
Figure 4 for PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents

Foundation models trained on large-scale dataset gain a recent surge in CV and NLP. In contrast, development in biomedical domain lags far behind due to data scarcity. To address this issue, we build and release PMC-OA, a biomedical dataset with 1.6M image-caption pairs collected from PubMedCentral's OpenAccess subset, which is 8 times larger than before. PMC-OA covers diverse modalities or diseases, with majority of the image-caption samples aligned at finer-grained level, i.e., subfigure and subcaption. While pretraining a CLIP-style model on PMC-OA, our model named PMC-CLIP achieves state-of-the-art results on various downstream tasks, including image-text retrieval on ROCO, MedMNIST image classification, Medical VQA, i.e. +8.1% R@10 on image-text retrieval, +3.9% accuracy on image classification.

* 10 pages, 3 figures 
Viaarxiv icon

K-Space Transformer for Fast MRI Reconstruction with Implicit Representation

Jun 14, 2022
Ziheng Zhao, Tianjiao Zhang, Weidi Xie, Yanfeng Wang, Ya Zhang

Figure 1 for K-Space Transformer for Fast MRI Reconstruction with Implicit Representation
Figure 2 for K-Space Transformer for Fast MRI Reconstruction with Implicit Representation
Figure 3 for K-Space Transformer for Fast MRI Reconstruction with Implicit Representation
Figure 4 for K-Space Transformer for Fast MRI Reconstruction with Implicit Representation

This paper considers the problem of fast MRI reconstruction. We propose a novel Transformer-based framework for directly processing the sparsely sampled signals in k-space, going beyond the limitation of regular grids as ConvNets do. We adopt an implicit representation of spectrogram, treating spatial coordinates as inputs, and dynamically query the partially observed measurements to complete the spectrogram, i.e. learning the inductive bias in k-space. To strive a balance between computational cost and reconstruction quality, we build an hierarchical structure with low-resolution and high-resolution decoders respectively. To validate the necessity of our proposed modules, we have conducted extensive experiments on two public datasets, and demonstrate superior or comparable performance over state-of-the-art approaches.

Viaarxiv icon