Alert button
Picture for Junjie Yao

Junjie Yao

Alert button

Instruction-ViT: Multi-Modal Prompts for Instruction Learning in ViT

Apr 29, 2023
Zhenxiang Xiao, Yuzhong Chen, Lu Zhang, Junjie Yao, Zihao Wu, Xiaowei Yu, Yi Pan, Lin Zhao, Chong Ma, Xinyu Liu, Wei Liu, Xiang Li, Yixuan Yuan, Dinggang Shen, Dajiang Zhu, Tianming Liu, Xi Jiang

Figure 1 for Instruction-ViT: Multi-Modal Prompts for Instruction Learning in ViT
Figure 2 for Instruction-ViT: Multi-Modal Prompts for Instruction Learning in ViT
Figure 3 for Instruction-ViT: Multi-Modal Prompts for Instruction Learning in ViT
Figure 4 for Instruction-ViT: Multi-Modal Prompts for Instruction Learning in ViT

Prompts have been proven to play a crucial role in large language models, and in recent years, vision models have also been using prompts to improve scalability for multiple downstream tasks. In this paper, we focus on adapting prompt design based on instruction tuning into a visual transformer model for image classification which we called Instruction-ViT. The key idea is to implement multi-modal prompts (text or image prompt) related to category information to guide the fine-tuning of the model. Based on the experiments of several image captionining tasks, the performance and domain adaptability were improved. Our work provided an innovative strategy to fuse multi-modal prompts with better performance and faster adaptability for visual classification models.

Viaarxiv icon

ChatABL: Abductive Learning via Natural Language Interaction with ChatGPT

Apr 21, 2023
Tianyang Zhong, Yaonai Wei, Li Yang, Zihao Wu, Zhengliang Liu, Xiaozheng Wei, Wenjun Li, Junjie Yao, Chong Ma, Xiang Li, Dajiang Zhu, Xi Jiang, Junwei Han, Dinggang Shen, Tianming Liu, Tuo Zhang

Figure 1 for ChatABL: Abductive Learning via Natural Language Interaction with ChatGPT
Figure 2 for ChatABL: Abductive Learning via Natural Language Interaction with ChatGPT
Figure 3 for ChatABL: Abductive Learning via Natural Language Interaction with ChatGPT
Figure 4 for ChatABL: Abductive Learning via Natural Language Interaction with ChatGPT

Large language models (LLMs) such as ChatGPT have recently demonstrated significant potential in mathematical abilities, providing valuable reasoning paradigm consistent with human natural language. However, LLMs currently have difficulty in bridging perception, language understanding and reasoning capabilities due to incompatibility of the underlying information flow among them, making it challenging to accomplish tasks autonomously. On the other hand, abductive learning (ABL) frameworks for integrating the two abilities of perception and reasoning has seen significant success in inverse decipherment of incomplete facts, but it is limited by the lack of semantic understanding of logical reasoning rules and the dependence on complicated domain knowledge representation. This paper presents a novel method (ChatABL) for integrating LLMs into the ABL framework, aiming at unifying the three abilities in a more user-friendly and understandable manner. The proposed method uses the strengths of LLMs' understanding and logical reasoning to correct the incomplete logical facts for optimizing the performance of perceptual module, by summarizing and reorganizing reasoning rules represented in natural language format. Similarly, perceptual module provides necessary reasoning examples for LLMs in natural language format. The variable-length handwritten equation deciphering task, an abstract expression of the Mayan calendar decoding, is used as a testbed to demonstrate that ChatABL has reasoning ability beyond most existing state-of-the-art methods, which has been well supported by comparative studies. To our best knowledge, the proposed ChatABL is the first attempt to explore a new pattern for further approaching human-level cognitive ability via natural language interaction with ChatGPT.

Viaarxiv icon

Deep Image Prior for Sparse-sampling Photoacoustic Microscopy

Oct 15, 2020
Tri Vu, Anthony DiSpirito III, Daiwei Li, Zixuan Zhang, Xiaoyi Zhu, Maomao Chen, Dong Zhang, Jianwen Luo, Yu Shrike Zhang, Roarke Horstmeyer, Junjie Yao

Figure 1 for Deep Image Prior for Sparse-sampling Photoacoustic Microscopy
Figure 2 for Deep Image Prior for Sparse-sampling Photoacoustic Microscopy
Figure 3 for Deep Image Prior for Sparse-sampling Photoacoustic Microscopy
Figure 4 for Deep Image Prior for Sparse-sampling Photoacoustic Microscopy

Photoacoustic microscopy (PAM) is an emerging method for imaging both structural and functional information without the need for exogenous contrast agents. However, state-of-the-art PAM faces a tradeoff between imaging speed and spatial sampling density within the same field-of-view (FOV). Limited by the pulsed laser's repetition rate, the imaging speed is inversely proportional to the total number of effective pixels. To cover the same FOV in a shorter amount of time with the same PAM hardware, there is currently no other option than to decrease spatial sampling density (i.e., sparse sampling). Deep learning methods have recently been used to improve sparsely sampled PAM images; however, these methods often require time-consuming pre-training and a large training dataset that has fully sampled, co-registered ground truth. In this paper, we propose using a method known as "deep image prior" to improve the image quality of sparsely sampled PAM images. The network does not need prior learning or fully sampled ground truth, making its implementation more flexible and much quicker. Our results show promising improvement in PA vasculature images with as few as 2% of the effective pixels. Our deep image prior approach produces results that outperform interpolation methods and can be readily translated to other high-speed, sparse-sampling imaging modalities.

Viaarxiv icon

Reconstructing undersampled photoacoustic microscopy images using deep learning

May 30, 2020
Anthony DiSpirito III, Daiwei Li, Tri Vu, Maomao Chen, Dong Zhang, Jianwen Luo, Roarke Horstmeyer, Junjie Yao

Figure 1 for Reconstructing undersampled photoacoustic microscopy images using deep learning
Figure 2 for Reconstructing undersampled photoacoustic microscopy images using deep learning
Figure 3 for Reconstructing undersampled photoacoustic microscopy images using deep learning
Figure 4 for Reconstructing undersampled photoacoustic microscopy images using deep learning

One primary technical challenge in photoacoustic microscopy (PAM) is the necessary compromise between spatial resolution and imaging speed. In this study, we propose a novel application of deep learning principles to reconstruct undersampled PAM images and transcend the trade-off between spatial resolution and imaging speed. We compared various convolutional neural network (CNN) architectures, and selected a fully dense U-net (FD U-net) model that produced the best results. To mimic various undersampling conditions in practice, we artificially downsampled fully-sampled PAM images of mouse brain vasculature at different ratios. This allowed us to not only definitively establish the ground truth, but also train and test our deep learning model at various imaging conditions. Our results and numerical analysis have collectively demonstrated the robust performance of our model to reconstruct PAM images with as few as 2% of the original pixels, which may effectively shorten the imaging time without substantially sacrificing the image quality.

* 12 pages, 7 main figures, 3 supplemental figures (see last 2 pages) 
Viaarxiv icon