Abstract:Deformable medical image registration is a fundamental task in medical image analysis with applications in disease diagnosis, treatment planning, and image-guided interventions. Despite significant advances in deep learning based registration methods, accurately aligning images with large deformations while preserving anatomical plausibility remains a challenging task. In this paper, we propose a novel Attention-Driven Framework for Non-Rigid Medical Image Registration (AD-RegNet) that employs attention mechanisms to guide the registration process. Our approach combines a 3D UNet backbone with bidirectional cross-attention, which establishes correspondences between moving and fixed images at multiple scales. We introduce a regional adaptive attention mechanism that focuses on anatomically relevant structures, along with a multi-resolution deformation field synthesis approach for accurate alignment. The method is evaluated on two distinct datasets: DIRLab for thoracic 4D CT scans and IXI for brain MRI scans, demonstrating its versatility across different anatomical structures and imaging modalities. Experimental results demonstrate that our approach achieves performance competitive with state-of-the-art methods on the IXI and DIRLab datasets. The proposed method maintains a favorable balance between registration accuracy and computational efficiency, making it suitable for clinical applications. A comprehensive evaluation using normalized cross-correlation (NCC), mean squared error (MSE), structural similarity (SSIM), Jacobian determinant, and target registration error (TRE) indicates that attention-guided registration improves alignment accuracy while ensuring anatomically plausible deformations.
Abstract:Unsupervised deformable image registration requires aligning complex anatomical structures without reference labels, making interpretability and reliability critical. Existing deep learning methods achieve considerable accuracy but often lack transparency, leading to error drift and reduced clinical trust. We propose a novel Multi-Hop Visual Chain of Reasoning (VCoR) framework that reformulates registration as a progressive reasoning process. Inspired by the iterative nature of clinical decision-making, each visual reasoning hop integrates a Localized Spatial Refinement (LSR) module to enrich feature representations and a Cross-Reference Attention (CRA) mechanism that leads the iterative refinement process, preserving anatomical consistency. This multi-hop strategy enables robust handling of large deformations and produces a transparent sequence of intermediate predictions with a theoretical bound. Beyond accuracy, our framework offers built-in interpretability by estimating uncertainty via the stability and convergence of deformation fields across hops. Extensive evaluations on two challenging public datasets, DIR-Lab 4D CT (lung) and IXI T1-weighted MRI (brain), demonstrate that VCoR achieves competitive registration accuracy while offering rich intermediate visualizations and confidence measures. By embedding an implicit visual reasoning paradigm, we present an interpretable, reliable, and clinically viable unsupervised medical image registration.