on behalf of the PINNACLE consortium
Abstract:Accurate motion estimation at high acceleration factors enables rapid motion-compensated reconstruction in Magnetic Resonance Imaging (MRI) without compromising the diagnostic image quality. In this work, we introduce an attention-aware deep learning-based framework that can perform non-rigid pairwise registration for fully sampled and accelerated MRI. We extract local visual representations to build similarity maps between the registered image pairs at multiple resolution levels and additionally leverage long-range contextual information using a transformer-based module to alleviate ambiguities in the presence of artifacts caused by undersampling. We combine local and global dependencies to perform simultaneous coarse and fine motion estimation. The proposed method was evaluated on in-house acquired fully sampled and accelerated data of 101 patients and 62 healthy subjects undergoing cardiac and thoracic MRI. The impact of motion estimation accuracy on the downstream task of motion-compensated reconstruction was analyzed. We demonstrate that our model derives reliable and consistent motion fields across different sampling trajectories (Cartesian and radial) and acceleration factors of up to 16x for cardiac motion and 30x for respiratory motion and achieves superior image quality in motion-compensated reconstruction qualitatively and quantitatively compared to conventional and recent deep learning-based approaches. The code is publicly available at https://github.com/lab-midas/GMARAFT.




Abstract:Report generation models offer fine-grained textual interpretations of medical images like chest X-rays, yet they often lack interactivity (i.e. the ability to steer the generation process through user queries) and localized interpretability (i.e. visually grounding their predictions), which we deem essential for future adoption in clinical practice. While there have been efforts to tackle these issues, they are either limited in their interactivity by not supporting textual queries or fail to also offer localized interpretability. Therefore, we propose a novel multitask architecture and training paradigm integrating textual prompts and bounding boxes for diverse aspects like anatomical regions and pathologies. We call this approach the Chest X-Ray Explainer (ChEX). Evaluations across a heterogeneous set of 9 chest X-ray tasks, including localized image interpretation and report generation, showcase its competitiveness with SOTA models while additional analysis demonstrates ChEX's interactive capabilities.




Abstract:Unsupervised anomaly detection enables the identification of potential pathological areas by juxtaposing original images with their pseudo-healthy reconstructions generated by models trained exclusively on normal images. However, the clinical interpretation of resultant anomaly maps presents a challenge due to a lack of detailed, understandable explanations. Recent advancements in language models have shown the capability of mimicking human-like understanding and providing detailed descriptions. This raises an interesting question: \textit{How can language models be employed to make the anomaly maps more explainable?} To the best of our knowledge, we are the first to leverage a language model for unsupervised anomaly detection, for which we construct a dataset with different questions and answers. Additionally, we present a novel multi-image visual question answering framework tailored for anomaly detection, incorporating diverse feature fusion strategies to enhance visual knowledge extraction. Our experiments reveal that the framework, augmented by our new Knowledge Q-Former module, adeptly answers questions on the anomaly detection dataset. Besides, integrating anomaly maps as inputs distinctly aids in improving the detection of unseen pathologies.




Abstract:Cardiac magnetic resonance (CMR) image acquisition requires subjects to hold their breath while 2D cine images are acquired. This process assumes that the heart remains in the same position across all slices. However, differences in breathhold positions or patient motion introduce 3D slice misalignments. In this work, we propose an algorithm that simultaneously aligns all SA and LA slices by maximizing the pair-wise intensity agreement between their intersections. Unlike previous works, our approach is formulated as a subject-specific optimization problem and requires no prior knowledge of the underlying anatomy. We quantitatively demonstrate that the proposed method is robust against a large range of rotations and translations by synthetically misaligning 10 motion-free datasets and aligning them back using the proposed method.




Abstract:Anatomical atlases are widely used for population analysis. Conditional atlases target a particular sub-population defined via certain conditions (e.g. demographics or pathologies) and allow for the investigation of fine-grained anatomical differences - such as morphological changes correlated with age. Existing approaches use either registration-based methods that are unable to handle large anatomical variations or generative models, which can suffer from training instabilities and hallucinations. To overcome these limitations, we use latent diffusion models to generate deformation fields, which transform a general population atlas into one representing a specific sub-population. By generating a deformation field and registering the conditional atlas to a neighbourhood of images, we ensure structural plausibility and avoid hallucinations, which can occur during direct image synthesis. We compare our method to several state-of-the-art atlas generation methods in experiments using 5000 brain as well as whole-body MR images from UK Biobank. Our method generates highly realistic atlases with smooth transformations and high anatomical fidelity, outperforming the baselines.
Abstract:Topological accuracy in medical image segmentation is a highly important property for downstream applications such as network analysis and flow modeling in vessels or cell counting. Recently, significant methodological advancements have brought well-founded concepts from algebraic topology to binary segmentation. However, these approaches have been underexplored in multi-class segmentation scenarios, where topological errors are common. We propose a general loss function for topologically faithful multi-class segmentation extending the recent Betti matching concept, which is based on induced matchings of persistence barcodes. We project the N-class segmentation problem to N single-class segmentation tasks, which allows us to use 1-parameter persistent homology making training of neural networks computationally feasible. We validate our method on a comprehensive set of four medical datasets with highly variant topological characteristics. Our loss formulation significantly enhances topological correctness in cardiac, cell, artery-vein, and Circle of Willis segmentation.




Abstract:Diffusion models have advanced unsupervised anomaly detection by improving the transformation of pathological images into pseudo-healthy equivalents. Nonetheless, standard approaches may compromise critical information during pathology removal, leading to restorations that do not align with unaffected regions in the original scans. Such discrepancies can inadvertently increase false positive rates and reduce specificity, complicating radiological evaluations. This paper introduces Temporal Harmonization for Optimal Restoration (THOR), which refines the de-noising process by integrating implicit guidance through temporal anomaly maps. THOR aims to preserve the integrity of healthy tissue in areas unaffected by pathology. Comparative evaluations show that THOR surpasses existing diffusion-based methods in detecting and segmenting anomalies in brain MRIs and wrist X-rays. Code: https://github.com/ci-ber/THOR_DDPM.




Abstract:We introduce a conditional implicit neural atlas (CINA) for spatio-temporal atlas generation from Magnetic Resonance Images (MRI) of the neurotypical and pathological fetal brain, that is fully independent of affine or non-rigid registration. During training, CINA learns a general representation of the fetal brain and encodes subject specific information into latent code. After training, CINA can construct a faithful atlas with tissue probability maps of the fetal brain for any gestational age (GA) and anatomical variation covered within the training domain. Thus, CINA is competent to represent both, neurotypical and pathological brains. Furthermore, a trained CINA model can be fit to brain MRI of unseen subjects via test-time optimization of the latent code. CINA can then produce probabilistic tissue maps tailored to a particular subject. We evaluate our method on a total of 198 T2 weighted MRI of normal and abnormal fetal brains from the dHCP and FeTA datasets. We demonstrate CINA's capability to represent a fetal brain atlas that can be flexibly conditioned on GA and on anatomical variations like ventricular volume or degree of cortical folding, making it a suitable tool for modeling both neurotypical and pathological brains. We quantify the fidelity of our atlas by means of tissue segmentation and age prediction and compare it to an established baseline. CINA demonstrates superior accuracy for neurotypical brains and pathological brains with ventriculomegaly. Moreover, CINA scores a mean absolute error of 0.23 weeks in fetal brain age prediction, further confirming an accurate representation of fetal brain development.
Abstract:Image reconstruction attacks on machine learning models pose a significant risk to privacy by potentially leaking sensitive information. Although defending against such attacks using differential privacy (DP) has proven effective, determining appropriate DP parameters remains challenging. Current formal guarantees on data reconstruction success suffer from overly theoretical assumptions regarding adversary knowledge about the target data, particularly in the image domain. In this work, we empirically investigate this discrepancy and find that the practicality of these assumptions strongly depends on the domain shift between the data prior and the reconstruction target. We propose a reconstruction attack based on diffusion models (DMs) that assumes adversary access to real-world image priors and assess its implications on privacy leakage under DP-SGD. We show that (1) real-world data priors significantly influence reconstruction success, (2) current reconstruction bounds do not model the risk posed by data priors well, and (3) DMs can serve as effective auditing tools for visualizing privacy leakage.
Abstract:Analyzing temporal developments is crucial for the accurate prognosis of many medical conditions. Temporal changes that occur over short time scales are key to assessing the health of physiological functions, such as the cardiac cycle. Moreover, tracking longer term developments that occur over months or years in evolving processes, such as age-related macular degeneration (AMD), is essential for accurate prognosis. Despite the importance of both short and long term analysis to clinical decision making, they remain understudied in medical deep learning. State of the art methods for spatiotemporal representation learning, developed for short natural videos, prioritize the detection of temporal constants rather than temporal developments. Moreover, they do not account for varying time intervals between acquisitions, which are essential for contextualizing observed changes. To address these issues, we propose two approaches. First, we combine clip-level contrastive learning with a novel temporal embedding to adapt to irregular time series. Second, we propose masking and predicting latent frame representations of the temporal sequence. Our two approaches outperform all prior methods on temporally-dependent tasks including cardiac output estimation and three prognostic AMD tasks. Overall, this enables the automated analysis of temporal patterns which are typically overlooked in applications of deep learning to medicine.