Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xin Yang

College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou, P.R. China, Key Laboratory of Spectroscopy Sensing, Ministry of Agriculture and Rural Affairs, Hangzhou, P.R. China

Dual-view Correlation Hybrid Attention Network for Robust Holistic Mammogram Classification

Jun 19, 2023

Zhiwei Wang, Junlin Xian, Kangyi Liu, Xin Li, Qiang Li, Xin Yang

Abstract:Mammogram image is important for breast cancer screening, and typically obtained in a dual-view form, i.e., cranio-caudal (CC) and mediolateral oblique (MLO), to provide complementary information. However, previous methods mostly learn features from the two views independently, which violates the clinical knowledge and ignores the importance of dual-view correlation. In this paper, we propose a dual-view correlation hybrid attention network (DCHA-Net) for robust holistic mammogram classification. Specifically, DCHA-Net is carefully designed to extract and reinvent deep features for the two views, and meanwhile to maximize the underlying correlations between them. A hybrid attention module, consisting of local relation and non-local attention blocks, is proposed to alleviate the spatial misalignment of the paired views in the correlation maximization. A dual-view correlation loss is introduced to maximize the feature similarity between corresponding strip-like regions with equal distance to the chest wall, motivated by the fact that their features represent the same breast tissues, and thus should be highly-correlated. Experimental results on two public datasets, i.e., INbreast and CBIS-DDSM, demonstrate that DCHA-Net can well preserve and maximize feature correlations across views, and thus outperforms the state-of-the-arts for classifying a whole mammogram as malignant or not.

Via

Access Paper or Ask Questions

Inflated 3D Convolution-Transformer for Weakly-supervised Carotid Stenosis Grading with Ultrasound Videos

Jun 12, 2023

Xinrui Zhou, Yuhao Huang, Wufeng Xue, Xin Yang, Yuxin Zou, Qilong Ying, Yuanji Zhang, Jia Liu, Jie Ren, Dong Ni

Figure 1 for Inflated 3D Convolution-Transformer for Weakly-supervised Carotid Stenosis Grading with Ultrasound Videos

Figure 2 for Inflated 3D Convolution-Transformer for Weakly-supervised Carotid Stenosis Grading with Ultrasound Videos

Figure 3 for Inflated 3D Convolution-Transformer for Weakly-supervised Carotid Stenosis Grading with Ultrasound Videos

Figure 4 for Inflated 3D Convolution-Transformer for Weakly-supervised Carotid Stenosis Grading with Ultrasound Videos

Abstract:Localization of the narrowest position of the vessel and corresponding vessel and remnant vessel delineation in carotid ultrasound (US) are essential for carotid stenosis grading (CSG) in clinical practice. However, the pipeline is time-consuming and tough due to the ambiguous boundaries of plaque and temporal variation. To automatize this procedure, a large number of manual delineations are usually required, which is not only laborious but also not reliable given the annotation difficulty. In this study, we present the first video classification framework for automatic CSG. Our contribution is three-fold. First, to avoid the requirement of laborious and unreliable annotation, we propose a novel and effective video classification network for weakly-supervised CSG. Second, to ease the model training, we adopt an inflation strategy for the network, where pre-trained 2D convolution weights can be adapted into the 3D counterpart in our network for an effective warm start. Third, to enhance the feature discrimination of the video, we propose a novel attention-guided multi-dimension fusion (AMDF) transformer encoder to model and integrate global dependencies within and across spatial and temporal dimensions, where two lightweight cross-dimensional attention mechanisms are designed. Our approach is extensively validated on a large clinically collected carotid US video dataset, demonstrating state-of-the-art performance compared with strong competitors.

* Accepted by MICCAI 2023

Via

Access Paper or Ask Questions

Instructive Feature Enhancement for Dichotomous Medical Image Segmentation

Jun 06, 2023

Lian Liu, Han Zhou, Jiongquan Chen, Sijing Liu, Wenlong Shi, Dong Ni, Deng-Ping Fan, Xin Yang

Figure 1 for Instructive Feature Enhancement for Dichotomous Medical Image Segmentation

Figure 2 for Instructive Feature Enhancement for Dichotomous Medical Image Segmentation

Figure 3 for Instructive Feature Enhancement for Dichotomous Medical Image Segmentation

Figure 4 for Instructive Feature Enhancement for Dichotomous Medical Image Segmentation

Abstract:Deep neural networks have been widely applied in dichotomous medical image segmentation (DMIS) of many anatomical structures in several modalities, achieving promising performance. However, existing networks tend to struggle with task-specific, heavy and complex designs to improve accuracy. They made little instructions to which feature channels would be more beneficial for segmentation, and that may be why the performance and universality of these segmentation models are hindered. In this study, we propose an instructive feature enhancement approach, namely IFE, to adaptively select feature channels with rich texture cues and strong discriminability to enhance raw features based on local curvature or global information entropy criteria. Being plug-and-play and applicable for diverse DMIS tasks, IFE encourages the model to focus on texture-rich features which are especially important for the ambiguous and challenging boundary identification, simultaneously achieving simplicity, universality, and certain interpretability. To evaluate the proposed IFE, we constructed the first large-scale DMIS dataset Cosmos55k, which contains 55,023 images from 7 modalities and 26 anatomical structures. Extensive experiments show that IFE can improve the performance of classic segmentation networks across different anatomies and modalities with only slight modifications. Code is available at https://github.com/yezi-66/IFE

* Accepted by MICCAI 2023

Via

Access Paper or Ask Questions

Fourier Test-time Adaptation with Multi-level Consistency for Robust Classification

Jun 05, 2023

Yuhao Huang, Xin Yang, Xiaoqiong Huang, Xinrui Zhou, Haozhe Chi, Haoran Dou, Xindi Hu, Jian Wang, Xuedong Deng, Dong Ni

Figure 1 for Fourier Test-time Adaptation with Multi-level Consistency for Robust Classification

Figure 2 for Fourier Test-time Adaptation with Multi-level Consistency for Robust Classification

Figure 3 for Fourier Test-time Adaptation with Multi-level Consistency for Robust Classification

Figure 4 for Fourier Test-time Adaptation with Multi-level Consistency for Robust Classification

Abstract:Deep classifiers may encounter significant performance degradation when processing unseen testing data from varying centers, vendors, and protocols. Ensuring the robustness of deep models against these domain shifts is crucial for their widespread clinical application. In this study, we propose a novel approach called Fourier Test-time Adaptation (FTTA), which employs a dual-adaptation design to integrate input and model tuning, thereby jointly improving the model robustness. The main idea of FTTA is to build a reliable multi-level consistency measurement of paired inputs for achieving self-correction of prediction. Our contribution is two-fold. First, we encourage consistency in global features and local attention maps between the two transformed images of the same input. Here, the transformation refers to Fourier-based input adaptation, which can transfer one unseen image into source style to reduce the domain gap. Furthermore, we leverage style-interpolated images to enhance the global and local features with learnable parameters, which can smooth the consistency measurement and accelerate convergence. Second, we introduce a regularization technique that utilizes style interpolation consistency in the frequency space to encourage self-consistency in the logit space of the model output. This regularization provides strong self-supervised signals for robustness enhancement. FTTA was extensively validated on three large classification datasets with different modalities and organs. Experimental results show that FTTA is general and outperforms other strong state-of-the-art methods.

* Accepted by MICCAI 2023

Via

Access Paper or Ask Questions

Frame-Event Alignment and Fusion Network for High Frame Rate Tracking

May 25, 2023

Jiqing Zhang, Yuanchen Wang, Wenxi Liu, Meng Li, Jinpeng Bai, Baocai Yin, Xin Yang

Figure 1 for Frame-Event Alignment and Fusion Network for High Frame Rate Tracking

Figure 2 for Frame-Event Alignment and Fusion Network for High Frame Rate Tracking

Figure 3 for Frame-Event Alignment and Fusion Network for High Frame Rate Tracking

Figure 4 for Frame-Event Alignment and Fusion Network for High Frame Rate Tracking

Abstract:Most existing RGB-based trackers target low frame rate benchmarks of around 30 frames per second. This setting restricts the tracker's functionality in the real world, especially for fast motion. Event-based cameras as bioinspired sensors provide considerable potential for high frame rate tracking due to their high temporal resolution. However, event-based cameras cannot offer fine-grained texture information like conventional cameras. This unique complementarity motivates us to combine conventional frames and events for high frame rate object tracking under various challenging conditions. Inthispaper, we propose an end-to-end network consisting of multi-modality alignment and fusion modules to effectively combine meaningful information from both modalities at different measurement rates. The alignment module is responsible for cross-style and cross-frame-rate alignment between frame and event modalities under the guidance of the moving cues furnished by events. While the fusion module is accountable for emphasizing valuable features and suppressing noise information by the mutual complement between the two modalities. Extensive experiments show that the proposed approach outperforms state-of-the-art trackers by a significant margin in high frame rate tracking. With the FE240hz dataset, our approach achieves high frame rate tracking up to 240Hz.

Via

Access Paper or Ask Questions

A Subabdominal MRI Image Segmentation Algorithm Based on Multi-Scale Feature Pyramid Network and Dual Attention Mechanism

May 19, 2023

Yu Xiao, Xin Yang, Sijuan Huang, Yongkai Liu, Shuqin Chen, Lihua Guo

Abstract:This study aimed to solve the semantic gap and misalignment issue between encoding and decoding because of multiple convolutional and pooling operations in U-Net when segmenting subabdominal MRI images during rectal cancer treatment. A MRI Image Segmentation is proposed based on a multi-scale feature pyramid network and dual attention mechanism. Our innovation is the design of two modules: 1) a dilated convolution and multi-scale feature pyramid network are used in the encoding to avoid the semantic gap. 2) a dual attention mechanism is designed to maintain spatial information of U-Net and reduce misalignment. Experiments on a subabdominal MRI image dataset show the proposed method achieves better performance than others methods. In conclusion, a multi-scale feature pyramid network can reduce the semantic gap, and the dual attention mechanism can make an alignment of features between encoding and decoding.

* 19 pages,9 figures

Via

Access Paper or Ask Questions

Differentially Private Attention Computation

May 08, 2023

Yeqi Gao, Zhao Song, Xin Yang

Abstract:Large language models (LLMs) have had a profound impact on numerous aspects of daily life including natural language processing, content generation, research methodologies and so on. However, one crucial issue concerning the inference results of large language models is security and privacy. In many scenarios, the results generated by LLMs could possibly leak many confidential or copyright information. A recent beautiful and breakthrough work [Vyas, Kakade and Barak 2023] focus on such privacy issue of the LLMs from theoretical perspective. It is well-known that computing the attention matrix is one of the major task during the LLMs computation. Thus, how to give a provable privately guarantees of computing the attention matrix is an important research direction. Previous work [Alman and Song 2023, Brand, Song and Zhou 2023] have proposed provable tight result for fast computation of attention without considering privacy concerns. One natural mathematical formulation to quantity the privacy in theoretical computer science graduate school textbook is differential privacy. Inspired by [Vyas, Kakade and Barak 2023], in this work, we provide a provable result for showing how to differentially private approximate the attention matrix. From technique perspective, our result replies on a pioneering work in the area of differential privacy by [Alabi, Kothari, Tankala, Venkat and Zhang 2022].

Via

Access Paper or Ask Questions

Segment Anything Model for Medical Images?

May 01, 2023

Yuhao Huang, Xin Yang, Lian Liu, Han Zhou, Ao Chang, Xinrui Zhou, Rusi Chen, Junxuan Yu, Jiongquan Chen, Chaoyu Chen(+5 more)

Figure 1 for Segment Anything Model for Medical Images?

Figure 2 for Segment Anything Model for Medical Images?

Figure 3 for Segment Anything Model for Medical Images?

Figure 4 for Segment Anything Model for Medical Images?

Abstract:The Segment Anything Model (SAM) is the first foundation model for general image segmentation. It designed a novel promotable segmentation task, ensuring zero-shot image segmentation using the pre-trained model via two main modes including automatic everything and manual prompt. SAM has achieved impressive results on various natural image segmentation tasks. However, medical image segmentation (MIS) is more challenging due to the complex modalities, fine anatomical structures, uncertain and complex object boundaries, and wide-range object scales. Meanwhile, zero-shot and efficient MIS can well reduce the annotation time and boost the development of medical image analysis. Hence, SAM seems to be a potential tool and its performance on large medical datasets should be further validated. We collected and sorted 52 open-source datasets, and built a large medical segmentation dataset with 16 modalities, 68 objects, and 553K slices. We conducted a comprehensive analysis of different SAM testing strategies on the so-called COSMOS 553K dataset. Extensive experiments validate that SAM performs better with manual hints like points and boxes for object perception in medical images, leading to better performance in prompt mode compared to everything mode. Additionally, SAM shows remarkable performance in some specific objects and modalities, but is imperfect or even totally fails in other situations. Finally, we analyze the influence of different factors (e.g., the Fourier-based boundary complexity and size of the segmented objects) on SAM's segmentation performance. Extensive experiments validate that SAM's zero-shot segmentation capability is not sufficient to ensure its direct application to the MIS.

* 23 pages, 14 figures, 12 tables

Via

Access Paper or Ask Questions

Hierarchical Agent-based Reinforcement Learning Framework for Automated Quality Assessment of Fetal Ultrasound Video

Apr 14, 2023

Sijing Liu, Qilong Ying, Shuangchi He, Xin Yang, Dong Ni, Ruobing Huang

Figure 1 for Hierarchical Agent-based Reinforcement Learning Framework for Automated Quality Assessment of Fetal Ultrasound Video

Figure 2 for Hierarchical Agent-based Reinforcement Learning Framework for Automated Quality Assessment of Fetal Ultrasound Video

Figure 3 for Hierarchical Agent-based Reinforcement Learning Framework for Automated Quality Assessment of Fetal Ultrasound Video

Abstract:Ultrasound is the primary modality to examine fetal growth during pregnancy, while the image quality could be affected by various factors. Quality assessment is essential for controlling the quality of ultrasound images to guarantee both the perceptual and diagnostic values. Existing automated approaches often require heavy structural annotations and the predictions may not necessarily be consistent with the assessment results by human experts. Furthermore, the overall quality of a scan and the correlation between the quality of frames should not be overlooked. In this work, we propose a reinforcement learning framework powered by two hierarchical agents that collaboratively learn to perform both frame-level and video-level quality assessments. It is equipped with a specially-designed reward mechanism that considers temporal dependency among frame quality and only requires sparse binary annotations to train. Experimental results on a challenging fetal brain dataset verify that the proposed framework could perform dual-level quality assessment and its predictions correlate well with the subjective assessment results.

Via

Access Paper or Ask Questions

HyperThumbnail: Real-time 6K Image Rescaling with Rate-distortion Optimization

Apr 03, 2023

Chenyang Qi, Xin Yang, Ka Leong Cheng, Ying-Cong Chen, Qifeng Chen

Figure 1 for HyperThumbnail: Real-time 6K Image Rescaling with Rate-distortion Optimization

Figure 2 for HyperThumbnail: Real-time 6K Image Rescaling with Rate-distortion Optimization

Figure 3 for HyperThumbnail: Real-time 6K Image Rescaling with Rate-distortion Optimization

Figure 4 for HyperThumbnail: Real-time 6K Image Rescaling with Rate-distortion Optimization

Abstract:Contemporary image rescaling aims at embedding a high-resolution (HR) image into a low-resolution (LR) thumbnail image that contains embedded information for HR image reconstruction. Unlike traditional image super-resolution, this enables high-fidelity HR image restoration faithful to the original one, given the embedded information in the LR thumbnail. However, state-of-the-art image rescaling methods do not optimize the LR image file size for efficient sharing and fall short of real-time performance for ultra-high-resolution (e.g., 6K) image reconstruction. To address these two challenges, we propose a novel framework (HyperThumbnail) for real-time 6K rate-distortion-aware image rescaling. Our framework first embeds an HR image into a JPEG LR thumbnail by an encoder with our proposed quantization prediction module, which minimizes the file size of the embedding LR JPEG thumbnail while maximizing HR reconstruction quality. Then, an efficient frequency-aware decoder reconstructs a high-fidelity HR image from the LR one in real time. Extensive experiments demonstrate that our framework outperforms previous image rescaling baselines in rate-distortion performance and can perform 6K image reconstruction in real time.

* Accepted by CVPR 2023; Github Repository: https://github.com/AbnerVictor/HyperThumbnail

Via

Access Paper or Ask Questions