Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zihan Li

Enhance the Image: Super Resolution using Artificial Intelligence in MRI

Jun 19, 2024

Ziyu Li, Zihan Li, Haoxiang Li, Qiuyun Fan, Karla L. Miller, Wenchuan Wu, Akshay S. Chaudhari, Qiyuan Tian

Figure 1 for Enhance the Image: Super Resolution using Artificial Intelligence in MRI

Figure 2 for Enhance the Image: Super Resolution using Artificial Intelligence in MRI

Figure 3 for Enhance the Image: Super Resolution using Artificial Intelligence in MRI

Figure 4 for Enhance the Image: Super Resolution using Artificial Intelligence in MRI

Abstract:This chapter provides an overview of deep learning techniques for improving the spatial resolution of MRI, ranging from convolutional neural networks, generative adversarial networks, to more advanced models including transformers, diffusion models, and implicit neural representations. Our exploration extends beyond the methodologies to scrutinize the impact of super-resolved images on clinical and neuroscientific assessments. We also cover various practical topics such as network architectures, image evaluation metrics, network loss functions, and training data specifics, including downsampling methods for simulating low-resolution images and dataset selection. Finally, we discuss existing challenges and potential future directions regarding the feasibility and reliability of deep learning-based MRI super-resolution, with the aim to facilitate its wider adoption to benefit various clinical and neuroscientific applications.

* A book chapter in Machine Learning in MRI: From methods to clinical translation. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

ScribFormer: Transformer Makes CNN Work Better for Scribble-based Medical Image Segmentation

Feb 03, 2024

Zihan Li, Yuan Zheng, Dandan Shan, Shuzhou Yang, Qingde Li, Beizhan Wang, Yuanting Zhang, Qingqi Hong, Dinggang Shen

Figure 1 for ScribFormer: Transformer Makes CNN Work Better for Scribble-based Medical Image Segmentation

Figure 2 for ScribFormer: Transformer Makes CNN Work Better for Scribble-based Medical Image Segmentation

Figure 3 for ScribFormer: Transformer Makes CNN Work Better for Scribble-based Medical Image Segmentation

Figure 4 for ScribFormer: Transformer Makes CNN Work Better for Scribble-based Medical Image Segmentation

Abstract:Most recent scribble-supervised segmentation methods commonly adopt a CNN framework with an encoder-decoder architecture. Despite its multiple benefits, this framework generally can only capture small-range feature dependency for the convolutional layer with the local receptive field, which makes it difficult to learn global shape information from the limited information provided by scribble annotations. To address this issue, this paper proposes a new CNN-Transformer hybrid solution for scribble-supervised medical image segmentation called ScribFormer. The proposed ScribFormer model has a triple-branch structure, i.e., the hybrid of a CNN branch, a Transformer branch, and an attention-guided class activation map (ACAM) branch. Specifically, the CNN branch collaborates with the Transformer branch to fuse the local features learned from CNN with the global representations obtained from Transformer, which can effectively overcome limitations of existing scribble-supervised segmentation methods. Furthermore, the ACAM branch assists in unifying the shallow convolution features and the deep convolution features to improve model's performance further. Extensive experiments on two public datasets and one private dataset show that our ScribFormer has superior performance over the state-of-the-art scribble-supervised segmentation methods, and achieves even better results than the fully-supervised segmentation methods. The code is released at https://github.com/HUANGLIZI/ScribFormer.

* Accepted by IEEE Transactions on Medical Imaging (TMI)

Via

Access Paper or Ask Questions

nnSAM: Plug-and-play Segment Anything Model Improves nnUNet Performance

Oct 02, 2023

Yunxiang Li, Bowen Jing, Zihan Li, Jing Wang, You Zhang

Abstract:The recent developments of foundation models in computer vision, especially the Segment Anything Model (SAM), allow scalable and domain-agnostic image segmentation to serve as a general-purpose segmentation tool. In parallel, the field of medical image segmentation has benefited significantly from specialized neural networks like the nnUNet, which is trained on domain-specific datasets and can automatically configure the network to tailor to specific segmentation challenges. To combine the advantages of foundation models and domain-specific models, we present nnSAM, which synergistically integrates the SAM model with the nnUNet model to achieve more accurate and robust medical image segmentation. The nnSAM model leverages the powerful and robust feature extraction capabilities of SAM, while harnessing the automatic configuration capabilities of nnUNet to promote dataset-tailored learning. Our comprehensive evaluation of nnSAM model on different sizes of training samples shows that it allows few-shot learning, which is highly relevant for medical image segmentation where high-quality, annotated data can be scarce and costly to obtain. By melding the strengths of both its predecessors, nnSAM positions itself as a potential new benchmark in medical image segmentation, offering a tool that combines broad applicability with specialized efficiency. The code is available at https://github.com/Kent0n-Li/Medical-Image-Segmentation.

* 12 pages

Via

Access Paper or Ask Questions

Can Large Language Models Understand Real-World Complex Instructions?

Sep 17, 2023

Qianyu He, Jie Zeng, Wenhao Huang, Lina Chen, Jin Xiao, Qianxi He, Xunzhe Zhou, Lida Chen, Xintao Wang, Yuncheng Huang(+7 more)

Figure 1 for Can Large Language Models Understand Real-World Complex Instructions?

Figure 2 for Can Large Language Models Understand Real-World Complex Instructions?

Figure 3 for Can Large Language Models Understand Real-World Complex Instructions?

Figure 4 for Can Large Language Models Understand Real-World Complex Instructions?

Abstract:Large language models (LLMs) can understand human instructions, showing their potential for pragmatic applications beyond traditional NLP tasks. However, they still struggle with complex instructions, which can be either complex task descriptions that require multiple tasks and constraints, or complex input that contains long context, noise, heterogeneous information and multi-turn format. Due to these features, LLMs often ignore semantic constraints from task descriptions, generate incorrect formats, violate length or sample count constraints, and be unfaithful to the input text. Existing benchmarks are insufficient to assess LLMs' ability to understand complex instructions, as they are close-ended and simple. To bridge this gap, we propose CELLO, a benchmark for evaluating LLMs' ability to follow complex instructions systematically. We design eight features for complex instructions and construct a comprehensive evaluation dataset from real-world scenarios. We also establish four criteria and develop corresponding metrics, as current ones are inadequate, biased or too strict and coarse-grained. We compare the performance of representative Chinese-oriented and English-oriented models in following complex instructions through extensive experiments. Resources of CELLO are publicly available at https://github.com/Abbey4799/CELLO.

Via

Access Paper or Ask Questions

ScribbleVC: Scribble-supervised Medical Image Segmentation with Vision-Class Embedding

Jul 30, 2023

Zihan Li, Yuan Zheng, Xiangde Luo, Dandan Shan, Qingqi Hong

Abstract:Medical image segmentation plays a critical role in clinical decision-making, treatment planning, and disease monitoring. However, accurate segmentation of medical images is challenging due to several factors, such as the lack of high-quality annotation, imaging noise, and anatomical differences across patients. In addition, there is still a considerable gap in performance between the existing label-efficient methods and fully-supervised methods. To address the above challenges, we propose ScribbleVC, a novel framework for scribble-supervised medical image segmentation that leverages vision and class embeddings via the multimodal information enhancement mechanism. In addition, ScribbleVC uniformly utilizes the CNN features and Transformer features to achieve better visual feature extraction. The proposed method combines a scribble-based approach with a segmentation network and a class-embedding module to produce accurate segmentation masks. We evaluate ScribbleVC on three benchmark datasets and compare it with state-of-the-art methods. The experimental results demonstrate that our method outperforms existing approaches in terms of accuracy, robustness, and efficiency. The datasets and code are released on GitHub.

* Accepted by ACM MM 2023, project page: https://github.com/HUANGLIZI/ScribbleVC

Via

Access Paper or Ask Questions

SwinMM: Masked Multi-view with Swin Transformers for 3D Medical Image Segmentation

Jul 24, 2023

Yiqing Wang, Zihan Li, Jieru Mei, Zihao Wei, Li Liu, Chen Wang, Shengtian Sang, Alan Yuille, Cihang Xie, Yuyin Zhou

Figure 1 for SwinMM: Masked Multi-view with Swin Transformers for 3D Medical Image Segmentation

Figure 2 for SwinMM: Masked Multi-view with Swin Transformers for 3D Medical Image Segmentation

Figure 3 for SwinMM: Masked Multi-view with Swin Transformers for 3D Medical Image Segmentation

Figure 4 for SwinMM: Masked Multi-view with Swin Transformers for 3D Medical Image Segmentation

Abstract:Recent advancements in large-scale Vision Transformers have made significant strides in improving pre-trained models for medical image segmentation. However, these methods face a notable challenge in acquiring a substantial amount of pre-training data, particularly within the medical field. To address this limitation, we present Masked Multi-view with Swin Transformers (SwinMM), a novel multi-view pipeline for enabling accurate and data-efficient self-supervised medical image analysis. Our strategy harnesses the potential of multi-view information by incorporating two principal components. In the pre-training phase, we deploy a masked multi-view encoder devised to concurrently train masked multi-view observations through a range of diverse proxy tasks. These tasks span image reconstruction, rotation, contrastive learning, and a novel task that employs a mutual learning paradigm. This new task capitalizes on the consistency between predictions from various perspectives, enabling the extraction of hidden multi-view information from 3D medical data. In the fine-tuning stage, a cross-view decoder is developed to aggregate the multi-view information through a cross-attention block. Compared with the previous state-of-the-art self-supervised learning method Swin UNETR, SwinMM demonstrates a notable advantage on several medical image segmentation tasks. It allows for a smooth integration of multi-view information, significantly boosting both the accuracy and data-efficiency of the model. Code and models are available at https://github.com/UCSC-VLAA/SwinMM/.

* MICCAI 2023; project page: https://github.com/UCSC-VLAA/SwinMM/

Via

Access Paper or Ask Questions

Beyond the Obvious: Evaluating the Reasoning Ability In Real-life Scenarios of Language Models on Life Scapes Reasoning Benchmark~(LSR-Benchmark)

Jul 11, 2023

Zhouhong Gu, Zihan Li, Lin Zhang, Zhuozhi Xiong, Sihang Jiang, Xiaoxuan Zhu, Shusen Wang, Zili Wang, Jianchen Wang, Haoning Ye(+4 more)

Figure 1 for Beyond the Obvious: Evaluating the Reasoning Ability In Real-life Scenarios of Language Models on Life Scapes Reasoning Benchmark~(LSR-Benchmark)

Figure 2 for Beyond the Obvious: Evaluating the Reasoning Ability In Real-life Scenarios of Language Models on Life Scapes Reasoning Benchmark~(LSR-Benchmark)

Figure 3 for Beyond the Obvious: Evaluating the Reasoning Ability In Real-life Scenarios of Language Models on Life Scapes Reasoning Benchmark~(LSR-Benchmark)

Figure 4 for Beyond the Obvious: Evaluating the Reasoning Ability In Real-life Scenarios of Language Models on Life Scapes Reasoning Benchmark~(LSR-Benchmark)

Abstract:This paper introduces the Life Scapes Reasoning Benchmark (LSR-Benchmark), a novel dataset targeting real-life scenario reasoning, aiming to close the gap in artificial neural networks' ability to reason in everyday contexts. In contrast to domain knowledge reasoning datasets, LSR-Benchmark comprises free-text formatted questions with rich information on real-life scenarios, human behaviors, and character roles. The dataset consists of 2,162 questions collected from open-source online sources and is manually annotated to improve its quality. Experiments are conducted using state-of-the-art language models, such as gpt3.5-turbo and instruction fine-tuned llama models, to test the performance in LSR-Benchmark. The results reveal that humans outperform these models significantly, indicating a persisting challenge for machine learning models in comprehending daily human life.

Via

Access Paper or Ask Questions

Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation

Jun 15, 2023

Zhouhong Gu, Xiaoxuan Zhu, Haoning Ye, Lin Zhang, Jianchen Wang, Sihang Jiang, Zhuozhi Xiong, Zihan Li, Qianyu He, Rui Xu(+6 more)

Figure 1 for Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation

Figure 2 for Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation

Figure 3 for Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation

Figure 4 for Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation

Abstract:New Natural Langauge Process~(NLP) benchmarks are urgently needed to align with the rapid development of large language models (LLMs). We present Xiezhi, the most comprehensive evaluation suite designed to assess holistic domain knowledge. Xiezhi comprises multiple-choice questions across 516 diverse disciplines ranging from 13 different subjects with 249,587 questions and accompanied by Xiezhi-Specialty and Xiezhi-Interdiscipline, both with 15k questions. We conduct evaluation of the 47 cutting-edge LLMs on Xiezhi. Results indicate that LLMs exceed average performance of humans in science, engineering, agronomy, medicine, and art, but fall short in economics, jurisprudence, pedagogy, literature, history, and management. We anticipate Xiezhi will help analyze important strengths and shortcomings of LLMs, and the benchmark is released in~\url{https://github.com/MikeGu721/XiezhiBenchmark}.

* Under review of NeurIPS 2023

Via

Access Paper or Ask Questions

Multi-view MERA Subspace Clustering

May 16, 2023

Zhen Long, Ce Zhu, Jie Chen, Zihan Li, Yazhou Ren, Yipeng Liu

Abstract:Tensor-based multi-view subspace clustering (MSC) can capture high-order correlation in the self-representation tensor. Current tensor decompositions for MSC suffer from highly unbalanced unfolding matrices or rotation sensitivity, failing to fully explore inter/intra-view information. Using the advanced tensor network, namely, multi-scale entanglement renormalization ansatz (MERA), we propose a low-rank MERA based MSC (MERA-MSC) algorithm, where MERA factorizes a tensor into contractions of one top core factor and the rest orthogonal/semi-orthogonal factors. Benefiting from multiple interactions among orthogonal/semi-orthogonal (low-rank) factors, the low-rank MERA has a strong representation power to capture the complex inter/intra-view information in the self-representation tensor. The alternating direction method of multipliers is adopted to solve the optimization model. Experimental results on five multi-view datasets demonstrate MERA-MSC has superiority against the compared algorithms on six evaluation metrics. Furthermore, we extend MERA-MSC by incorporating anchor learning to develop a scalable low-rank MERA based multi-view clustering method (sMREA-MVC). The effectiveness and efficiency of sMERA-MVC have been validated on three large-scale multi-view datasets. To our knowledge, this is the first work to introduce MERA to the multi-view clustering topic. The codes of MERA-MSC and sMERA-MVC are publicly available at https://github.com/longzhen520/MERA-MSC.

Via

Access Paper or Ask Questions

DiscoPrompt: Path Prediction Prompt Tuning for Implicit Discourse Relation Recognition

May 06, 2023

Chunkit Chan, Xin Liu, Jiayang Cheng, Zihan Li, Yangqiu Song, Ginny Y. Wong, Simon See

Figure 1 for DiscoPrompt: Path Prediction Prompt Tuning for Implicit Discourse Relation Recognition

Figure 2 for DiscoPrompt: Path Prediction Prompt Tuning for Implicit Discourse Relation Recognition

Figure 3 for DiscoPrompt: Path Prediction Prompt Tuning for Implicit Discourse Relation Recognition

Figure 4 for DiscoPrompt: Path Prediction Prompt Tuning for Implicit Discourse Relation Recognition

Abstract:Implicit Discourse Relation Recognition (IDRR) is a sophisticated and challenging task to recognize the discourse relations between the arguments with the absence of discourse connectives. The sense labels for each discourse relation follow a hierarchical classification scheme in the annotation process (Prasad et al., 2008), forming a hierarchy structure. Most existing works do not well incorporate the hierarchy structure but focus on the syntax features and the prior knowledge of connectives in the manner of pure text classification. We argue that it is more effective to predict the paths inside the hierarchical tree (e.g., "Comparison -> Contrast -> however") rather than flat labels (e.g., Contrast) or connectives (e.g., however). We propose a prompt-based path prediction method to utilize the interactive information and intrinsic senses among the hierarchy in IDRR. This is the first work that injects such structure information into pre-trained language models via prompt tuning, and the performance of our solution shows significant and consistent improvement against competitive baselines.

* Accepted to Findings of ACL 2023

Via

Access Paper or Ask Questions