Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jun Shi

Predictive Accuracy-Based Active Learning for Medical Image Segmentation

May 01, 2024

Jun Shi, Shulan Ruan, Ziqi Zhu, Minfan Zhao, Hong An, Xudong Xue, Bing Yan

Figure 1 for Predictive Accuracy-Based Active Learning for Medical Image Segmentation

Figure 2 for Predictive Accuracy-Based Active Learning for Medical Image Segmentation

Figure 3 for Predictive Accuracy-Based Active Learning for Medical Image Segmentation

Figure 4 for Predictive Accuracy-Based Active Learning for Medical Image Segmentation

Abstract:Active learning is considered a viable solution to alleviate the contradiction between the high dependency of deep learning-based segmentation methods on annotated data and the expensive pixel-level annotation cost of medical images. However, most existing methods suffer from unreliable uncertainty assessment and the struggle to balance diversity and informativeness, leading to poor performance in segmentation tasks. In response, we propose an efficient Predictive Accuracy-based Active Learning (PAAL) method for medical image segmentation, first introducing predictive accuracy to define uncertainty. Specifically, PAAL mainly consists of an Accuracy Predictor (AP) and a Weighted Polling Strategy (WPS). The former is an attached learnable module that can accurately predict the segmentation accuracy of unlabeled samples relative to the target model with the predicted posterior probability. The latter provides an efficient hybrid querying scheme by combining predicted accuracy and feature representation, aiming to ensure the uncertainty and diversity of the acquired samples. Extensive experiment results on multiple datasets demonstrate the superiority of PAAL. PAAL achieves comparable accuracy to fully annotated data while reducing annotation costs by approximately 50% to 80%, showcasing significant potential in clinical applications. The code is available at https://github.com/shijun18/PAAL-MedSeg.

* 9 pages, 4 figures

Via

Access Paper or Ask Questions

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

Apr 28, 2024

Dingzhe Li, Yixiang Jin, Yong A, Hongze Yu, Jun Shi, Xiaoshuai Hao, Peng Hao, Huaping Liu, Fuchun Sun, Bin Fang

Figure 1 for What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

Figure 2 for What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

Figure 3 for What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

Figure 4 for What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

Abstract:The realization of universal robots is an ultimate goal of researchers. However, a key hurdle in achieving this goal lies in the robots' ability to manipulate objects in their unstructured surrounding environments according to different tasks. The learning-based approach is considered an effective way to address generalization. The impressive performance of foundation models in the fields of computer vision and natural language suggests the potential of embedding foundation models into manipulation tasks as a viable path toward achieving general manipulation capability. However, we believe achieving general manipulation capability requires an overarching framework akin to auto driving. This framework should encompass multiple functional modules, with different foundation models assuming distinct roles in facilitating general manipulation capability. This survey focuses on the contributions of foundation models to robot learning for manipulation. We propose a comprehensive framework and detail how foundation models can address challenges in each module of the framework. What's more, we examine current approaches, outline challenges, suggest future research directions, and identify potential risks associated with integrating foundation models into this domain.

Via

Access Paper or Ask Questions

Deciphering the lmpact of Pretraining Data on Large Language Models through Machine Unlearning

Feb 18, 2024

Yang Zhao, Li Du, Xiao Ding, Kai Xiong, Zhouhao Sun, Jun Shi, Ting Liu, Bing Qin

Figure 1 for Deciphering the lmpact of Pretraining Data on Large Language Models through Machine Unlearning

Figure 2 for Deciphering the lmpact of Pretraining Data on Large Language Models through Machine Unlearning

Figure 3 for Deciphering the lmpact of Pretraining Data on Large Language Models through Machine Unlearning

Figure 4 for Deciphering the lmpact of Pretraining Data on Large Language Models through Machine Unlearning

Abstract:Through pretraining on a corpus with various sources, Large Language Models (LLMs) have gained impressive performance. However, the impact of each component of the pretraining corpus remains opaque. As a result, the organization of the pretraining corpus is still empirical and may deviate from the optimal. To address this issue, we systematically analyze the impact of 48 datasets from 5 major categories of pretraining data of LLMs and measure their impacts on LLMs using benchmarks about nine major categories of model capabilities. Our analyses provide empirical results about the contribution of multiple corpora on the performances of LLMs, along with their joint impact patterns, including complementary, orthogonal, and correlational relationships. We also identify a set of ``high-impact data'' such as Books that is significantly related to a set of model capabilities. These findings provide insights into the organization of data to support more efficient pretraining of LLMs.

Via

Access Paper or Ask Questions

Pruner: An Efficient Cross-Platform Tensor Compiler with Dual Awareness

Feb 04, 2024

Liang Qiao, Jun Shi, Xiaoyu Hao, Xi Fang, Minfan Zhao, Ziqi Zhu, Junshi Chen, Hong An, Bing Li, Honghui Yuan(+1 more)

Abstract:Tensor program optimization on Deep Learning Accelerators (DLAs) is critical for efficient model deployment. Although search-based Deep Learning Compilers (DLCs) have achieved significant performance gains compared to manual methods, they still suffer from the persistent challenges of low search efficiency and poor cross-platform adaptability. In this paper, we propose $\textbf{Pruner}$, following hardware/software co-design principles to hierarchically boost tensor program optimization. Pruner comprises two primary components: a Parameterized Static Analyzer ($\textbf{PSA}$) and a Pattern-aware Cost Model ($\textbf{PaCM}$). The former serves as a hardware-aware and formulaic performance analysis tool, guiding the pruning of the search space, while the latter enables the performance prediction of tensor programs according to the critical data-flow patterns. Furthermore, to ensure effective cross-platform adaptation, we design a Momentum Transfer Learning ($\textbf{MTL}$) strategy using a Siamese network, which establishes a bidirectional feedback mechanism to improve the robustness of the pre-trained cost model. The extensive experimental results demonstrate the effectiveness and advancement of the proposed Pruner in various tensor program tuning tasks across both online and offline scenarios, with low resource overhead. The code is available at https://github.com/qiaolian9/Pruner.

Via

Access Paper or Ask Questions

Deep Unfolding Network with Spatial Alignment for multi-modal MRI reconstruction

Dec 28, 2023

Hao Zhang, Qi Wang, Jun Shi, Shihui Ying, Zhijie Wen

Figure 1 for Deep Unfolding Network with Spatial Alignment for multi-modal MRI reconstruction

Figure 2 for Deep Unfolding Network with Spatial Alignment for multi-modal MRI reconstruction

Figure 3 for Deep Unfolding Network with Spatial Alignment for multi-modal MRI reconstruction

Figure 4 for Deep Unfolding Network with Spatial Alignment for multi-modal MRI reconstruction

Abstract:Multi-modal Magnetic Resonance Imaging (MRI) offers complementary diagnostic information, but some modalities are limited by the long scanning time. To accelerate the whole acquisition process, MRI reconstruction of one modality from highly undersampled k-space data with another fully-sampled reference modality is an efficient solution. However, the misalignment between modalities, which is common in clinic practice, can negatively affect reconstruction quality. Existing deep learning-based methods that account for inter-modality misalignment perform better, but still share two main common limitations: (1) The spatial alignment task is not adaptively integrated with the reconstruction process, resulting in insufficient complementarity between the two tasks; (2) the entire framework has weak interpretability. In this paper, we construct a novel Deep Unfolding Network with Spatial Alignment, termed DUN-SA, to appropriately embed the spatial alignment task into the reconstruction process. Concretely, we derive a novel joint alignment-reconstruction model with a specially designed cross-modal spatial alignment term. By relaxing the model into cross-modal spatial alignment and multi-modal reconstruction tasks, we propose an effective algorithm to solve this model alternatively. Then, we unfold the iterative steps of the proposed algorithm and design corresponding network modules to build DUN-SA with interpretability. Through end-to-end training, we effectively compensate for spatial misalignment using only reconstruction loss, and utilize the progressively aligned reference modality to provide inter-modality prior to improve the reconstruction of the target modality. Comprehensive experiments on three real datasets demonstrate that our method exhibits superior reconstruction performance compared to state-of-the-art methods.

Via

Access Paper or Ask Questions

RobotGPT: Robot Manipulation Learning from ChatGPT

Dec 03, 2023

Yixiang Jin, Dingzhe Li, Yong A, Jun Shi, Peng Hao, Fuchun Sun, Jianwei Zhang, Bin Fang

Figure 1 for RobotGPT: Robot Manipulation Learning from ChatGPT

Figure 2 for RobotGPT: Robot Manipulation Learning from ChatGPT

Figure 3 for RobotGPT: Robot Manipulation Learning from ChatGPT

Figure 4 for RobotGPT: Robot Manipulation Learning from ChatGPT

Abstract:We present RobotGPT, an innovative decision framework for robotic manipulation that prioritizes stability and safety. The execution code generated by ChatGPT cannot guarantee the stability and safety of the system. ChatGPT may provide different answers for the same task, leading to unpredictability. This instability prevents the direct integration of ChatGPT into the robot manipulation loop. Although setting the temperature to 0 can generate more consistent outputs, it may cause ChatGPT to lose diversity and creativity. Our objective is to leverage ChatGPT's problem-solving capabilities in robot manipulation and train a reliable agent. The framework includes an effective prompt structure and a robust learning model. Additionally, we introduce a metric for measuring task difficulty to evaluate ChatGPT's performance in robot manipulation. Furthermore, we evaluate RobotGPT in both simulation and real-world environments. Compared to directly using ChatGPT to generate code, our framework significantly improves task success rates, with an average increase from 38.5% to 91.5%. Therefore, training a RobotGPT by utilizing ChatGPT as an expert is a more stable approach compared to directly using ChatGPT as a task planner.

Via

Access Paper or Ask Questions

Single-shot Phase Retrieval from a Fractional Fourier Transform Perspective

Nov 18, 2023

Yixiao Yang, Ran Tao, Kaixuan Wei, Jun Shi

Figure 1 for Single-shot Phase Retrieval from a Fractional Fourier Transform Perspective

Figure 2 for Single-shot Phase Retrieval from a Fractional Fourier Transform Perspective

Figure 3 for Single-shot Phase Retrieval from a Fractional Fourier Transform Perspective

Figure 4 for Single-shot Phase Retrieval from a Fractional Fourier Transform Perspective

Abstract:The realm of classical phase retrieval concerns itself with the arduous task of recovering a signal from its Fourier magnitude measurements, which are fraught with inherent ambiguities. A single-exposure intensity measurement is commonly deemed insufficient for the reconstruction of the primal signal, given that the absent phase component is imperative for the inverse transformation. In this work, we present a novel single-shot phase retrieval paradigm from a fractional Fourier transform (FrFT) perspective, which involves integrating the FrFT-based physical measurement model within a self-supervised reconstruction scheme. Specifically, the proposed FrFT-based measurement model addresses the aliasing artifacts problem in the numerical calculation of Fresnel diffraction, featuring adaptability to both short-distance and long-distance propagation scenarios. Moreover, the intensity measurement in the FrFT domain proves highly effective in alleviating the ambiguities of phase retrieval and relaxing the previous conditions on oversampled or multiple measurements in the Fourier domain. Furthermore, the proposed self-supervised reconstruction approach harnesses the fast discrete algorithm of FrFT alongside untrained neural network priors, thereby attaining preeminent results. Through numerical simulations, we demonstrate that both amplitude and phase objects can be effectively retrieved from a single-shot intensity measurement using the proposed approach and provide a promising technique for support-free coherent diffraction imaging.

Via

Access Paper or Ask Questions

Sparse Sampling Transformer with Uncertainty-Driven Ranking for Unified Removal of Raindrops and Rain Streaks

Aug 27, 2023

Sixiang Chen, Tian Ye, Jinbin Bai, Erkang Chen, Jun Shi, Lei Zhu

Abstract:In the real world, image degradations caused by rain often exhibit a combination of rain streaks and raindrops, thereby increasing the challenges of recovering the underlying clean image. Note that the rain streaks and raindrops have diverse shapes, sizes, and locations in the captured image, and thus modeling the correlation relationship between irregular degradations caused by rain artifacts is a necessary prerequisite for image deraining. This paper aims to present an efficient and flexible mechanism to learn and model degradation relationships in a global view, thereby achieving a unified removal of intricate rain scenes. To do so, we propose a Sparse Sampling Transformer based on Uncertainty-Driven Ranking, dubbed UDR-S2Former. Compared to previous methods, our UDR-S2Former has three merits. First, it can adaptively sample relevant image degradation information to model underlying degradation relationships. Second, explicit application of the uncertainty-driven ranking strategy can facilitate the network to attend to degradation features and understand the reconstruction process. Finally, experimental results show that our UDR-S2Former clearly outperforms state-of-the-art methods for all benchmarks.

* Accepted by ICCV'23

Via

Access Paper or Ask Questions

Multi-Scale Prototypical Transformer for Whole Slide Image Classification

Jul 05, 2023

Saisai Ding, Jun Wang, Juncheng Li, Jun Shi

Abstract:Whole slide image (WSI) classification is an essential task in computational pathology. Despite the recent advances in multiple instance learning (MIL) for WSI classification, accurate classification of WSIs remains challenging due to the extreme imbalance between the positive and negative instances in bags, and the complicated pre-processing to fuse multi-scale information of WSI. To this end, we propose a novel multi-scale prototypical Transformer (MSPT) for WSI classification, which includes a prototypical Transformer (PT) module and a multi-scale feature fusion module (MFFM). The PT is developed to reduce redundant instances in bags by integrating prototypical learning into the Transformer architecture. It substitutes all instances with cluster prototypes, which are then re-calibrated through the self-attention mechanism of the Trans-former. Thereafter, an MFFM is proposed to fuse the clustered prototypes of different scales, which employs MLP-Mixer to enhance the information communication between prototypes. The experimental results on two public WSI datasets demonstrate that the proposed MSPT outperforms all the compared algorithms, suggesting its potential applications.

Via

Access Paper or Ask Questions

H-DenseFormer: An Efficient Hybrid Densely Connected Transformer for Multimodal Tumor Segmentation

Jul 04, 2023

Jun Shi, Hongyu Kan, Shulan Ruan, Ziqi Zhu, Minfan Zhao, Liang Qiao, Zhaohui Wang, Hong An, Xudong Xue

Figure 1 for H-DenseFormer: An Efficient Hybrid Densely Connected Transformer for Multimodal Tumor Segmentation

Figure 2 for H-DenseFormer: An Efficient Hybrid Densely Connected Transformer for Multimodal Tumor Segmentation

Figure 3 for H-DenseFormer: An Efficient Hybrid Densely Connected Transformer for Multimodal Tumor Segmentation

Figure 4 for H-DenseFormer: An Efficient Hybrid Densely Connected Transformer for Multimodal Tumor Segmentation

Abstract:Recently, deep learning methods have been widely used for tumor segmentation of multimodal medical images with promising results. However, most existing methods are limited by insufficient representational ability, specific modality number and high computational complexity. In this paper, we propose a hybrid densely connected network for tumor segmentation, named H-DenseFormer, which combines the representational power of the Convolutional Neural Network (CNN) and the Transformer structures. Specifically, H-DenseFormer integrates a Transformer-based Multi-path Parallel Embedding (MPE) module that can take an arbitrary number of modalities as input to extract the fusion features from different modalities. Then, the multimodal fusion features are delivered to different levels of the encoder to enhance multimodal learning representation. Besides, we design a lightweight Densely Connected Transformer (DCT) block to replace the standard Transformer block, thus significantly reducing computational complexity. We conduct extensive experiments on two public multimodal datasets, HECKTOR21 and PI-CAI22. The experimental results show that our proposed method outperforms the existing state-of-the-art methods while having lower computational complexity. The source code is available at https://github.com/shijun18/H-DenseFormer.

* 11 pages, 2 figures. This paper has been accepted by Medical Image Computing and Computer-Assisted Intervention(MICCAI) 2023

Via

Access Paper or Ask Questions