Medical image registration is vital for disease diagnosis and treatment with its ability to merge diverse information of images, which may be captured under different times, angles, or modalities. Although several surveys have reviewed the development of medical image registration, these surveys have not systematically summarized methodologies of existing medical image registration methods. To this end, we provide a comprehensive review of these methods from traditional and deep learning-based directions, aiming to help audiences understand the development of medical image registration quickly. In particular, we review recent advances in retinal image registration at the end of each section, which has not attracted much attention. Additionally, we also discuss the current challenges of retinal image registration and provide insights and prospects for future research.
Information-seeking dialogue systems are widely used in e-commerce systems, with answers that must be tailored to fit the specific settings of the online system. Given the user query, the information-seeking dialogue systems first retrieve a subset of response candidates, then further select the best response from the candidate set through re-ranking. Current methods mainly retrieve response candidates based solely on the current query, however, incorporating similar questions could introduce more diverse content, potentially refining the representation and improving the matching process. Hence, in this paper, we proposed a Query-bag based Pseudo Relevance Feedback framework (QB-PRF), which constructs a query-bag with related queries to serve as pseudo signals to guide information-seeking conversations. Concretely, we first propose a Query-bag Selection module (QBS), which utilizes contrastive learning to train the selection of synonymous queries in an unsupervised manner by leveraging the representations learned from pre-trained VAE. Secondly, we come up with a Query-bag Fusion module (QBF) that fuses synonymous queries to enhance the semantic representation of the original query through multidimensional attention computation. We verify the effectiveness of the QB-PRF framework on two competitive pretrained backbone models, including BERT and GPT-2. Experimental results on two benchmark datasets show that our framework achieves superior performance over strong baselines.
In the digital era, the rapid propagation of fake news and rumors via social networks brings notable societal challenges and impacts public opinion regulation. Traditional fake news modeling typically forecasts the general popularity trends of different groups or numerically represents opinions shift. However, these methods often oversimplify real-world complexities and overlook the rich semantic information of news text. The advent of large language models (LLMs) provides the possibility of modeling subtle dynamics of opinion. Consequently, in this work, we introduce a Fake news Propagation Simulation framework (FPS) based on LLM, which studies the trends and control of fake news propagation in detail. Specifically, each agent in the simulation represents an individual with a distinct personality. They are equipped with both short-term and long-term memory, as well as a reflective mechanism to mimic human-like thinking. Every day, they engage in random opinion exchanges, reflect on their thinking, and update their opinions. Our simulation results uncover patterns in fake news propagation related to topic relevance, and individual traits, aligning with real-world observations. Additionally, we evaluate various intervention strategies and demonstrate that early and appropriately frequent interventions strike a balance between governance cost and effectiveness, offering valuable insights for practical applications. Our study underscores the significant utility and potential of LLMs in combating fake news.
The morphologies of vessel-like structures, such as blood vessels and nerve fibres, play significant roles in disease diagnosis, e.g., Parkinson's disease. Deep network-based refinement segmentation methods have recently achieved promising vessel-like structure segmentation results. There are still two challenges: (1) existing methods have limitations in rehabilitating subsection ruptures in segmented vessel-like structures; (2) they are often overconfident in predicted segmentation results. To tackle these two challenges, this paper attempts to leverage the potential of spatial interconnection relationships among subsection ruptures from the structure rehabilitation perspective. Based on this, we propose a novel Vessel-like Structure Rehabilitation Network (VSR-Net) to rehabilitate subsection ruptures and improve the model calibration based on coarse vessel-like structure segmentation results. VSR-Net first constructs subsection rupture clusters with Curvilinear Clustering Module (CCM). Then, the well-designed Curvilinear Merging Module (CMM) is applied to rehabilitate the subsection ruptures to obtain the refined vessel-like structures. Extensive experiments on five 2D/3D medical image datasets show that VSR-Net significantly outperforms state-of-the-art (SOTA) refinement segmentation methods with lower calibration error. Additionally, we provide quantitative analysis to explain the morphological difference between the rehabilitation results of VSR-Net and ground truth (GT), which is smaller than SOTA methods and GT, demonstrating that our method better rehabilitates vessel-like structures by restoring subsection ruptures.
Referring Remote Sensing Image Segmentation (RRSIS) is a new challenge that combines computer vision and natural language processing, delineating specific regions in aerial images as described by textual queries. Traditional Referring Image Segmentation (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery, leading to suboptimal segmentation results. To address these challenges, we introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS. RMSIN incorporates an Intra-scale Interaction Module (IIM) to effectively address the fine-grained detail required at multiple scales and a Cross-scale Interaction Module (CIM) for integrating these details coherently across the network. Furthermore, RMSIN employs an Adaptive Rotated Convolution (ARC) to account for the diverse orientations of objects, a novel contribution that significantly enhances segmentation accuracy. To assess the efficacy of RMSIN, we have curated an expansive dataset comprising 17,402 image-caption-mask triplets, which is unparalleled in terms of scale and variety. This dataset not only presents the model with a wide range of spatial and rotational scenarios but also establishes a stringent benchmark for the RRSIS task, ensuring a rigorous evaluation of performance. Our experimental evaluations demonstrate the exceptional performance of RMSIN, surpassing existing state-of-the-art models by a significant margin. All datasets and code are made available at https://github.com/Lsan2401/RMSIN.
Early-stage diabetic retinopathy (DR) presents challenges in clinical diagnosis due to inconspicuous and minute microangioma lesions, resulting in limited research in this area. Additionally, the potential of emerging foundation models, such as the segment anything model (SAM), in medical scenarios remains rarely explored. In this work, we propose a human-in-the-loop, label-free early DR diagnosis framework called GlanceSeg, based on SAM. GlanceSeg enables real-time segmentation of microangioma lesions as ophthalmologists review fundus images. Our human-in-the-loop framework integrates the ophthalmologist's gaze map, allowing for rough localization of minute lesions in fundus images. Subsequently, a saliency map is generated based on the located region of interest, which provides prompt points to assist the foundation model in efficiently segmenting microangioma lesions. Finally, a domain knowledge filter refines the segmentation of minute lesions. We conducted experiments on two newly-built public datasets, i.e., IDRiD and Retinal-Lesions, and validated the feasibility and superiority of GlanceSeg through visualized illustrations and quantitative measures. Additionally, we demonstrated that GlanceSeg improves annotation efficiency for clinicians and enhances segmentation performance through fine-tuning using annotations. This study highlights the potential of GlanceSeg-based annotations for self-model optimization, leading to enduring performance advancements through continual learning.
Pathological myopia (PM) is the leading ocular disease for impaired vision and blindness worldwide. The key to detecting PM as early as possible is to detect informative features in global and local lesion regions, such as fundus tessellation, atrophy and maculopathy. However, applying classical convolutional neural networks (CNNs) to efficiently highlight global and local lesion context information in feature maps is quite challenging. To tackle this issue, we aim to fully leverage the potential of global and local lesion information with attention module design. Based on this, we propose an efficient pyramid channel attention (EPCA) module, which dynamically explores the relative importance of global and local lesion context information in feature maps. Then we combine the EPCA module with the backbone network to construct EPCA-Net for automatic PM detection based on fundus images. In addition, we construct a PM dataset termed PM-fundus by collecting fundus images of PM from publicly available datasets (e.g., the PALM dataset and ODIR dataset). The comprehensive experiments are conducted on three datasets, demonstrating that our EPCA-Net outperforms state-of-the-art methods in detecting PM. Furthermore, motivated by the recent pretraining-and-finetuning paradigm, we attempt to adapt pre-trained natural image models for PM detection by freezing them and treating the EPCA module and other attention modules as the adapters. The results show that our method with the pretraining-and-finetuning paradigm achieves competitive performance through comparisons to part of methods with traditional fine-tuning methods with fewer tunable parameters.
Deep neural networks (DNNs) have promoted the development of computer aided diagnosis (CAD) systems for fundus diseases, helping ophthalmologists reduce missed diagnosis and misdiagnosis rate. However, the majority of CAD systems are data-driven but lack of medical prior knowledge which can be performance-friendly. In this regard, we innovatively proposed a human-in-the-loop (HITL) CAD system by leveraging ophthalmologists' eye-tracking information, which is more efficient and accurate. Concretely, the HITL CAD system was implemented on the multiple instance learning (MIL), where eye-tracking gaze maps were beneficial to cherry-pick diagnosis-related instances. Furthermore, the dual-cross-attention MIL (DCAMIL) network was utilized to curb the adverse effects of noisy instances. Meanwhile, both sequence augmentation module and domain adversarial module were introduced to enrich and standardize instances in the training bag, respectively, thereby enhancing the robustness of our method. We conduct comparative experiments on our newly constructed datasets (namely, AMD-Gaze and DR-Gaze), respectively for the AMD and early DR detection. Rigorous experiments demonstrate the feasibility of our HITL CAD system and the superiority of the proposed DCAMIL, fully exploring the ophthalmologists' eye-tracking information. These investigations indicate that physicians' gaze maps, as medical prior knowledge, is potential to contribute to the CAD systems of clinical diseases.
Spatial attention mechanism has been widely incorporated into deep convolutional neural networks (CNNs) via long-range dependency capturing, significantly lifting the performance in computer vision, but it may perform poorly in medical imaging. Unfortunately, existing efforts are often unaware that long-range dependency capturing has limitations in highlighting subtle lesion regions, neglecting to exploit the potential of multi-scale pixel context information to improve the representational capability of CNNs. In this paper, we propose a practical yet lightweight architectural unit, Pyramid Pixel Context Recalibration (PPCR) module, which exploits multi-scale pixel context information to recalibrate pixel position in a pixel-independent manner adaptively. PPCR first designs a cross-channel pyramid pooling to aggregate multi-scale pixel context information, then eliminates the inconsistency among them by the well-designed pixel normalization, and finally estimates per pixel attention weight via a pixel context integration. PPCR can be flexibly plugged into modern CNNs with negligible overhead. Extensive experiments on five medical image datasets and CIFAR benchmarks empirically demonstrate the superiority and generalization of PPCR over state-of-the-art attention methods. The in-depth analyses explain the inherent behavior of PPCR in the decision-making process, improving the interpretability of CNNs.
Cataract is one of the leading causes of reversible visual impairment and blindness globally. Over the years, researchers have achieved significant progress in developing state-of-the-art artificial intelligence techniques for automatic cataract classification and grading, helping clinicians prevent and treat cataract in time. This paper provides a comprehensive survey of recent advances in machine learning for cataract classification and grading based on ophthalmic images. We summarize existing literature from two research directions: conventional machine learning techniques and deep learning techniques. This paper also provides insights into existing works of both merits and limitations. In addition, we discuss several challenges of automatic cataract classification and grading based on machine learning techniques and present possible solutions to these challenges for future research.