Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nassir Navab

Computer Aided Medical Procedures, Technische Universit Munchen, Germany, Johns Hopkins University, Baltimore MD, USA

Prior-RadGraphFormer: A Prior-Knowledge-Enhanced Transformer for Generating Radiology Graphs from X-Rays

Mar 27, 2023

Yiheng Xiong, Jingsong Liu, Kamilia Zaripova, Sahand Sharifzadeh, Matthias Keicher, Nassir Navab

Figure 1 for Prior-RadGraphFormer: A Prior-Knowledge-Enhanced Transformer for Generating Radiology Graphs from X-Rays

Figure 2 for Prior-RadGraphFormer: A Prior-Knowledge-Enhanced Transformer for Generating Radiology Graphs from X-Rays

Figure 3 for Prior-RadGraphFormer: A Prior-Knowledge-Enhanced Transformer for Generating Radiology Graphs from X-Rays

Figure 4 for Prior-RadGraphFormer: A Prior-Knowledge-Enhanced Transformer for Generating Radiology Graphs from X-Rays

Abstract:The extraction of structured clinical information from free-text radiology reports in the form of radiology graphs has been demonstrated to be a valuable approach for evaluating the clinical correctness of report-generation methods. However, the direct generation of radiology graphs from chest X-ray (CXR) images has not been attempted. To address this gap, we propose a novel approach called Prior-RadGraphFormer that utilizes a transformer model with prior knowledge in the form of a probabilistic knowledge graph (PKG) to generate radiology graphs directly from CXR images. The PKG models the statistical relationship between radiology entities, including anatomical structures and medical observations. This additional contextual information enhances the accuracy of entity and relation extraction. The generated radiology graphs can be applied to various downstream tasks, such as free-text or structured reports generation and multi-label classification of pathologies. Our approach represents a promising method for generating radiology graphs directly from CXR images, and has significant potential for improving medical image analysis and clinical decision-making.

* 12 pages, 4 figures, 4 tables

Via

Access Paper or Ask Questions

On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks

Mar 26, 2023

HyunJun Jung, Patrick Ruhkamp, Guangyao Zhai, Nikolas Brasch, Yitong Li, Yannick Verdie, Jifei Song, Yiren Zhou, Anil Armagan, Slobodan Ilic(+3 more)

Figure 1 for On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks

Figure 2 for On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks

Figure 3 for On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks

Figure 4 for On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks

Abstract:Learning-based methods to solve dense 3D vision problems typically train on 3D sensor data. The respectively used principle of measuring distances provides advantages and drawbacks. These are typically not compared nor discussed in the literature due to a lack of multi-modal datasets. Texture-less regions are problematic for structure from motion and stereo, reflective material poses issues for active sensing, and distances for translucent objects are intricate to measure with existing hardware. Training on inaccurate or corrupt data induces model bias and hampers generalisation capabilities. These effects remain unnoticed if the sensor measurement is considered as ground truth during the evaluation. This paper investigates the effect of sensor errors for the dense 3D vision tasks of depth estimation and reconstruction. We rigorously show the significant impact of sensor characteristics on the learned predictions and notice generalisation issues arising from various technologies in everyday household environments. For evaluation, we introduce a carefully designed dataset\footnote{dataset available at https://github.com/Junggy/HAMMER-dataset} comprising measurements from commodity sensors, namely D-ToF, I-ToF, passive/active stereo, and monocular RGB+P. Our study quantifies the considerable sensor noise impact and paves the way to improved dense vision estimates and targeted data fusion.

* Accepted at CVPR 2023, Main Paper + Supp. Mat. arXiv admin note: substantial text overlap with arXiv:2205.04565

Via

Access Paper or Ask Questions

LABRAD-OR: Lightweight Memory Scene Graphs for Accurate Bimodal Reasoning in Dynamic Operating Rooms

Mar 23, 2023

Ege Özsoy, Tobias Czempiel, Felix Holm, Chantal Pellegrini, Nassir Navab

Figure 1 for LABRAD-OR: Lightweight Memory Scene Graphs for Accurate Bimodal Reasoning in Dynamic Operating Rooms

Figure 2 for LABRAD-OR: Lightweight Memory Scene Graphs for Accurate Bimodal Reasoning in Dynamic Operating Rooms

Figure 3 for LABRAD-OR: Lightweight Memory Scene Graphs for Accurate Bimodal Reasoning in Dynamic Operating Rooms

Figure 4 for LABRAD-OR: Lightweight Memory Scene Graphs for Accurate Bimodal Reasoning in Dynamic Operating Rooms

Abstract:Modern surgeries are performed in complex and dynamic settings, including ever-changing interactions between medical staff, patients, and equipment. The holistic modeling of the operating room (OR) is, therefore, a challenging but essential task, with the potential to optimize the performance of surgical teams and aid in developing new surgical technologies to improve patient outcomes. The holistic representation of surgical scenes as semantic scene graphs (SGG), where entities are represented as nodes and relations between them as edges, is a promising direction for fine-grained semantic OR understanding. We propose, for the first time, the use of temporal information for more accurate and consistent holistic OR modeling. Specifically, we introduce memory scene graphs, where the scene graphs of previous time steps act as the temporal representation guiding the current prediction. We design an end-to-end architecture that intelligently fuses the temporal information of our lightweight memory scene graphs with the visual information from point clouds and images. We evaluate our method on the 4D-OR dataset and demonstrate that integrating temporality leads to more accurate and consistent results achieving an +5% increase and a new SOTA of 0.88 in macro F1. This work opens the path for representing the entire surgery history with memory scene graphs and improves the holistic understanding in the OR. Introducing scene graphs as memory representations can offer a valuable tool for many temporal understanding tasks.

* 11 pages, 3 figures

Via

Access Paper or Ask Questions

MI-SegNet: Mutual Information-Based US Segmentation for Unseen Domain Generalization

Mar 22, 2023

Yuan Bi, Zhongliang Jiang, Ricarda Clarenbach, Reza Ghotbi, Angelos Karlas, Nassir Navab

Abstract:Generalization capabilities of learning-based medical image segmentation across domains are currently limited by the performance degradation caused by the domain shift, particularly for ultrasound (US) imaging. The quality of US images heavily relies on carefully tuned acoustic parameters, which vary across sonographers, machines, and settings. To improve the generalizability on US images across domains, we propose MI-SegNet, a novel mutual information (MI) based framework to explicitly disentangle the anatomical and domain feature representations; therefore, robust domain-independent segmentation can be expected. Two encoders are employed to extract the relevant features for the disentanglement. The segmentation only uses the anatomical feature map for its prediction. In order to force the encoders to learn meaningful feature representations a cross-reconstruction method is used during training. Transformations, specific to either domain or anatomy are applied to guide the encoders in their respective feature extraction task. Additionally, any MI present in both feature maps is punished to further promote separate feature spaces. We validate the generalizability of the proposed domain-independent segmentation approach on several datasets with varying parameters and machines. Furthermore, we demonstrate the effectiveness of the proposed MI-SegNet serving as a pre-trained model by comparing it with state-of-the-art networks.

Via

Access Paper or Ask Questions

Semantic Latent Space Regression of Diffusion Autoencoders for Vertebral Fracture Grading

Mar 21, 2023

Matthias Keicher, Matan Atad, David Schinz, Alexandra S. Gersing, Sarah C. Foreman, Sophia S. Goller, Juergen Weissinger, Jon Rischewski, Anna-Sophia Dietrich, Benedikt Wiestler(+2 more)

Figure 1 for Semantic Latent Space Regression of Diffusion Autoencoders for Vertebral Fracture Grading

Figure 2 for Semantic Latent Space Regression of Diffusion Autoencoders for Vertebral Fracture Grading

Figure 3 for Semantic Latent Space Regression of Diffusion Autoencoders for Vertebral Fracture Grading

Figure 4 for Semantic Latent Space Regression of Diffusion Autoencoders for Vertebral Fracture Grading

Abstract:Vertebral fractures are a consequence of osteoporosis, with significant health implications for affected patients. Unfortunately, grading their severity using CT exams is hard and subjective, motivating automated grading methods. However, current approaches are hindered by imbalance and scarcity of data and a lack of interpretability. To address these challenges, this paper proposes a novel approach that leverages unlabelled data to train a generative Diffusion Autoencoder (DAE) model as an unsupervised feature extractor. We model fracture grading as a continuous regression, which is more reflective of the smooth progression of fractures. Specifically, we use a binary, supervised fracture classifier to construct a hyperplane in the DAE's latent space. We then regress the severity of the fracture as a function of the distance to this hyperplane, calibrating the results to the Genant scale. Importantly, the generative nature of our method allows us to visualize different grades of a given vertebra, providing interpretability and insight into the features that contribute to automated grading.

* Under review

Via

Access Paper or Ask Questions

Location-Free Scene Graph Generation

Mar 20, 2023

Ege Özsoy, Felix Holm, Tobias Czempiel, Nassir Navab, Benjamin Busam

Figure 1 for Location-Free Scene Graph Generation

Figure 2 for Location-Free Scene Graph Generation

Figure 3 for Location-Free Scene Graph Generation

Figure 4 for Location-Free Scene Graph Generation

Abstract:Scene Graph Generation (SGG) is a challenging visual understanding task. It combines the detection of entities and relationships between them in a scene. Both previous works and existing evaluation metrics rely on bounding box labels, even though many downstream scene graph applications do not need location information. The need for localization labels significantly increases the annotation cost and hampers the creation of more and larger scene graph datasets. We suggest breaking the dependency of scene graphs on bounding box labels by proposing location-free scene graph generation (LF-SGG). This new task aims at predicting instances of entities, as well as their relationships, without spatial localization. To objectively evaluate the task, the predicted and ground truth scene graphs need to be compared. We solve this NP-hard problem through an efficient algorithm using branching. Additionally, we design the first LF-SGG method, Pix2SG, using autoregressive sequence modeling. Our proposed method is evaluated on Visual Genome and 4D-OR. Although using significantly fewer labels during training, we achieve 74.12\% of the location-supervised SOTA performance on Visual Genome and even outperform the best method on 4D-OR.

* 10 pages

Via

Access Paper or Ask Questions

IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction

Mar 16, 2023

Dekai Zhu, Guangyao Zhai, Yan Di, Fabian Manhardt, Hendrik Berkemeyer, Tuan Tran, Nassir Navab, Federico Tombari, Benjamin Busam

Abstract:Reliable multi-agent trajectory prediction is crucial for the safe planning and control of autonomous systems. Compared with single-agent cases, the major challenge in simultaneously processing multiple agents lies in modeling complex social interactions caused by various driving intentions and road conditions. Previous methods typically leverage graph-based message propagation or attention mechanism to encapsulate such interactions in the format of marginal probabilistic distributions. However, it is inherently sub-optimal. In this paper, we propose IPCC-TP, a novel relevance-aware module based on Incremental Pearson Correlation Coefficient to improve multi-agent interaction modeling. IPCC-TP learns pairwise joint Gaussian Distributions through the tightly-coupled estimation of the means and covariances according to interactive incremental movements. Our module can be conveniently embedded into existing multi-agent prediction methods to extend original motion distribution decoders. Extensive experiments on nuScenes and Argoverse 2 datasets demonstrate that IPCC-TP improves the performance of baselines by a large margin.

* CVPR 2023 accepted

Via

Access Paper or Ask Questions

Unsupervised Traffic Scene Generation with Synthetic 3D Scene Graphs

Mar 15, 2023

Artem Savkin, Rachid Ellouze, Nassir Navab, Federico Tombari

Figure 1 for Unsupervised Traffic Scene Generation with Synthetic 3D Scene Graphs

Figure 2 for Unsupervised Traffic Scene Generation with Synthetic 3D Scene Graphs

Figure 3 for Unsupervised Traffic Scene Generation with Synthetic 3D Scene Graphs

Figure 4 for Unsupervised Traffic Scene Generation with Synthetic 3D Scene Graphs

Abstract:Image synthesis driven by computer graphics achieved recently a remarkable realism, yet synthetic image data generated this way reveals a significant domain gap with respect to real-world data. This is especially true in autonomous driving scenarios, which represent a critical aspect for overcoming utilizing synthetic data for training neural networks. We propose a method based on domain-invariant scene representation to directly synthesize traffic scene imagery without rendering. Specifically, we rely on synthetic scene graphs as our internal representation and introduce an unsupervised neural network architecture for realistic traffic scene synthesis. We enhance synthetic scene graphs with spatial information about the scene and demonstrate the effectiveness of our approach through scene manipulation.

Via

Access Paper or Ask Questions

Pixel-Level Explanation of Multiple Instance Learning Models in Biomedical Single Cell Images

Mar 15, 2023

Ario Sadafi, Oleksandra Adonkina, Ashkan Khakzar, Peter Lienemann, Rudolf Matthias Hehr, Daniel Rueckert, Nassir Navab, Carsten Marr

Figure 1 for Pixel-Level Explanation of Multiple Instance Learning Models in Biomedical Single Cell Images

Figure 2 for Pixel-Level Explanation of Multiple Instance Learning Models in Biomedical Single Cell Images

Figure 3 for Pixel-Level Explanation of Multiple Instance Learning Models in Biomedical Single Cell Images

Figure 4 for Pixel-Level Explanation of Multiple Instance Learning Models in Biomedical Single Cell Images

Abstract:Explainability is a key requirement for computer-aided diagnosis systems in clinical decision-making. Multiple instance learning with attention pooling provides instance-level explainability, however for many clinical applications a deeper, pixel-level explanation is desirable, but missing so far. In this work, we investigate the use of four attribution methods to explain a multiple instance learning models: GradCAM, Layer-Wise Relevance Propagation (LRP), Information Bottleneck Attribution (IBA), and InputIBA. With this collection of methods, we can derive pixel-level explanations on for the task of diagnosing blood cancer from patients' blood smears. We study two datasets of acute myeloid leukemia with over 100 000 single cell images and observe how each attribution method performs on the multiple instance learning architecture focusing on different properties of the white blood single cells. Additionally, we compare attribution maps with the annotations of a medical expert to see how the model's decision-making differs from the human standard. Our study addresses the challenge of implementing pixel-level explainability in multiple instance learning models and provides insights for clinicians to better understand and trust decisions from computer-aided diagnosis systems.

* Accepted for publication at the international conference on Information Processing in Medical Imaging (IPMI 2023)

Via

Access Paper or Ask Questions

BEL: A Bag Embedding Loss for Transformer enhances Multiple Instance Whole Slide Image Classification

Mar 02, 2023

Daniel Sens, Ario Sadafi, Francesco Paolo Casale, Nassir Navab, Carsten Marr

Figure 1 for BEL: A Bag Embedding Loss for Transformer enhances Multiple Instance Whole Slide Image Classification

Figure 2 for BEL: A Bag Embedding Loss for Transformer enhances Multiple Instance Whole Slide Image Classification

Figure 3 for BEL: A Bag Embedding Loss for Transformer enhances Multiple Instance Whole Slide Image Classification

Abstract:Multiple Instance Learning (MIL) has become the predominant approach for classification tasks on gigapixel histopathology whole slide images (WSIs). Within the MIL framework, single WSIs (bags) are decomposed into patches (instances), with only WSI-level annotation available. Recent MIL approaches produce highly informative bag level representations by utilizing the transformer architecture's ability to model the dependencies between instances. However, when applied to high magnification datasets, problems emerge due to the large number of instances and the weak supervisory learning signal. To address this problem, we propose to additionally train transformers with a novel Bag Embedding Loss (BEL). BEL forces the model to learn a discriminative bag-level representation by minimizing the distance between bag embeddings of the same class and maximizing the distance between different classes. We evaluate BEL with the Transformer architecture TransMIL on two publicly available histopathology datasets, BRACS and CAMELYON17. We show that with BEL, TransMIL outperforms the baseline models on both datasets, thus contributing to the clinically highly relevant AI-based tumor classification of histological patient material.

Via

Access Paper or Ask Questions