Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andreas Maier

Pattern Recognition Lab, FAU Erlangen-Nürnberg, Germany

ProPLIKS: Probablistic 3D human body pose estimation

Dec 05, 2024

Karthik Shetty, Annette Birkhold, Bernhard Egger, Srikrishna Jaganathan, Norbert Strobel, Markus Kowarschik, Andreas Maier

Figure 1 for ProPLIKS: Probablistic 3D human body pose estimation

Figure 2 for ProPLIKS: Probablistic 3D human body pose estimation

Figure 3 for ProPLIKS: Probablistic 3D human body pose estimation

Figure 4 for ProPLIKS: Probablistic 3D human body pose estimation

Abstract:We present a novel approach for 3D human pose estimation by employing probabilistic modeling. This approach leverages the advantages of normalizing flows in non-Euclidean geometries to address uncertain poses. Specifically, our method employs normalizing flow tailored to the SO(3) rotational group, incorporating a coupling mechanism based on the M\"obius transformation. This enables the framework to accurately represent any distribution on SO(3), effectively addressing issues related to discontinuities. Additionally, we reinterpret the challenge of reconstructing 3D human figures from 2D pixel-aligned inputs as the task of mapping these inputs to a range of probable poses. This perspective acknowledges the intrinsic ambiguity of the task and facilitates a straightforward integration method for multi-view scenarios. The combination of these strategies showcases the effectiveness of probabilistic models in complex scenarios for human pose estimation techniques. Our approach notably surpasses existing methods in the field of pose estimation. We also validate our methodology on human pose estimation from RGB images as well as medical X-Ray datasets.

Via

Access Paper or Ask Questions

Gesture Classification in Artworks Using Contextual Image Features

Dec 04, 2024

Azhar Hussian, Mathias Zinnen, Thi My Hang Tran, Andreas Maier, Vincent Christlein

Figure 1 for Gesture Classification in Artworks Using Contextual Image Features

Figure 2 for Gesture Classification in Artworks Using Contextual Image Features

Figure 3 for Gesture Classification in Artworks Using Contextual Image Features

Figure 4 for Gesture Classification in Artworks Using Contextual Image Features

Abstract:Recognizing gestures in artworks can add a valuable dimension to art understanding and help to acknowledge the role of the sense of smell in cultural heritage. We propose a method to recognize smell gestures in historical artworks. We show that combining local features with global image context improves classification performance notably on different backbones.

* Digital Humanities Conference, Arlington, USA, 2024, pp.287-290

Via

Access Paper or Ask Questions

Neural Image Unfolding: Flattening Sparse Anatomical Structures using Neural Fields

Nov 27, 2024

Leonhard Rist, Pluvio Stephan, Noah Maul, Linda Vorberg, Hendrik Ditt, Michael Sühling, Andreas Maier, Bernhard Egger, Oliver Taubmann

Figure 1 for Neural Image Unfolding: Flattening Sparse Anatomical Structures using Neural Fields

Figure 2 for Neural Image Unfolding: Flattening Sparse Anatomical Structures using Neural Fields

Figure 3 for Neural Image Unfolding: Flattening Sparse Anatomical Structures using Neural Fields

Figure 4 for Neural Image Unfolding: Flattening Sparse Anatomical Structures using Neural Fields

Abstract:Tomographic imaging reveals internal structures of 3D objects and is crucial for medical diagnoses. Visualizing the morphology and appearance of non-planar sparse anatomical structures that extend over multiple 2D slices in tomographic volumes is inherently difficult but valuable for decision-making and reporting. Hence, various organ-specific unfolding techniques exist to map their densely sampled 3D surfaces to a distortion-minimized 2D representation. However, there is no versatile framework to flatten complex sparse structures including vascular, duct or bone systems. We deploy a neural field to fit the transformation of the anatomy of interest to a 2D overview image. We further propose distortion regularization strategies and combine geometric with intensity-based loss formulations to also display non-annotated and auxiliary targets. In addition to improved versatility, our unfolding technique outperforms mesh-based baselines for sparse structures w.r.t. peak distortion and our regularization scheme yields smoother transformations compared to Jacobian formulations from neural field-based image registration.

Via

Access Paper or Ask Questions

Probing for Consciousness in Machines

Nov 25, 2024

Mathis Immertreu, Achim Schilling, Andreas Maier, Patrick Krauss

Figure 1 for Probing for Consciousness in Machines

Figure 2 for Probing for Consciousness in Machines

Figure 3 for Probing for Consciousness in Machines

Figure 4 for Probing for Consciousness in Machines

Abstract:This study explores the potential for artificial agents to develop core consciousness, as proposed by Antonio Damasio's theory of consciousness. According to Damasio, the emergence of core consciousness relies on the integration of a self model, informed by representations of emotions and feelings, and a world model. We hypothesize that an artificial agent, trained via reinforcement learning (RL) in a virtual environment, can develop preliminary forms of these models as a byproduct of its primary task. The agent's main objective is to learn to play a video game and explore the environment. To evaluate the emergence of world and self models, we employ probes-feedforward classifiers that use the activations of the trained agent's neural networks to predict the spatial positions of the agent itself. Our results demonstrate that the agent can form rudimentary world and self models, suggesting a pathway toward developing machine consciousness. This research provides foundational insights into the capabilities of artificial agents in mirroring aspects of human consciousness, with implications for future advancements in artificial intelligence.

Via

Access Paper or Ask Questions

Nonlinear Neural Dynamics and Classification Accuracy in Reservoir Computing

Nov 15, 2024

Claus Metzner, Achim Schilling, Andreas Maier, Patrick Krauss

Figure 1 for Nonlinear Neural Dynamics and Classification Accuracy in Reservoir Computing

Figure 2 for Nonlinear Neural Dynamics and Classification Accuracy in Reservoir Computing

Figure 3 for Nonlinear Neural Dynamics and Classification Accuracy in Reservoir Computing

Figure 4 for Nonlinear Neural Dynamics and Classification Accuracy in Reservoir Computing

Abstract:Reservoir computing - information processing based on untrained recurrent neural networks with random connections - is expected to depend on the nonlinear properties of the neurons and the resulting oscillatory, chaotic, or fixpoint dynamics of the network. However, the required degree of nonlinearity and the range of suitable dynamical regimes for a given task are not fully understood. To clarify these questions, we study the accuracy of a reservoir computer in artificial classification tasks of varying complexity, while tuning the neuron's degree of nonlinearity and the reservoir's dynamical regime. We find that, even for activation functions with extremely reduced nonlinearity, weak recurrent interactions and small input signals, the reservoir is able to compute useful representations, detectable only in higher order principal components, that render complex classificiation tasks linearly separable for the readout layer. When increasing the recurrent coupling, the reservoir develops spontaneous dynamical behavior. Nevertheless, the input-related computations can 'ride on top' of oscillatory or fixpoint attractors without much loss of accuracy, whereas chaotic dynamics reduces task performance more drastically. By tuning the system through the full range of dynamical phases, we find that the accuracy peaks both at the oscillatory/chaotic and at the chaotic/fixpoint phase boundaries, thus supporting the 'edge of chaos' hypothesis. Our results, in particular the robust weakly nonlinear operating regime, may offer new perspectives both for technical and biological neural networks with random connectivity.

Via

Access Paper or Ask Questions

A Realistic Collimated X-Ray Image Simulation Pipeline

Nov 15, 2024

Benjamin El-Zein, Dominik Eckert, Thomas Weber, Maximilian Rohleder, Ludwig Ritschl, Steffen Kappler, Andreas Maier

Abstract:Collimator detection remains a challenging task in X-ray systems with unreliable or non-available information about the detectors position relative to the source. This paper presents a physically motivated image processing pipeline for simulating the characteristics of collimator shadows in X-ray images. By generating randomized labels for collimator shapes and locations, incorporating scattered radiation simulation, and including Poisson noise, the pipeline enables the expansion of limited datasets for training deep neural networks. We validate the proposed pipeline by a qualitative and quantitative comparison against real collimator shadows. Furthermore, it is demonstrated that utilizing simulated data within our deep learning framework not only serves as a suitable substitute for actual collimators but also enhances the generalization performance when applied to real-world data.

Via

Access Paper or Ask Questions

Nuremberg Letterbooks: A Multi-Transcriptional Dataset of Early 15th Century Manuscripts for Document Analysis

Nov 11, 2024

Martin Mayr, Julian Krenz, Katharina Neumeier, Anna Bub, Simon Bürcky, Nina Brolich, Klaus Herbers, Mechthild Habermann, Peter Fleischmann, Andreas Maier(+1 more)

Figure 1 for Nuremberg Letterbooks: A Multi-Transcriptional Dataset of Early 15th Century Manuscripts for Document Analysis

Figure 2 for Nuremberg Letterbooks: A Multi-Transcriptional Dataset of Early 15th Century Manuscripts for Document Analysis

Figure 3 for Nuremberg Letterbooks: A Multi-Transcriptional Dataset of Early 15th Century Manuscripts for Document Analysis

Figure 4 for Nuremberg Letterbooks: A Multi-Transcriptional Dataset of Early 15th Century Manuscripts for Document Analysis

Abstract:Most datasets in the field of document analysis utilize highly standardized labels, which, while simplifying specific tasks, often produce outputs that are not directly applicable to humanities research. In contrast, the Nuremberg Letterbooks dataset, which comprises historical documents from the early 15th century, addresses this gap by providing multiple types of transcriptions and accompanying metadata. This approach allows for developing methods that are more closely aligned with the needs of the humanities. The dataset includes 4 books containing 1711 labeled pages written by 10 scribes. Three types of transcriptions are provided for handwritten text recognition: Basic, diplomatic, and regularized. For the latter two, versions with and without expanded abbreviations are also available. A combination of letter ID and writer ID supports writer identification due to changing writers within pages. In the technical validation, we established baselines for various tasks, demonstrating data consistency and providing benchmarks for future research to build upon.

Via

Access Paper or Ask Questions

Automated Segmentation and Analysis of Cone Photoreceptors in Multimodal Adaptive Optics Imaging

Oct 19, 2024

Prajol Shrestha, Mikhail Kulyabin, Aline Sindel, Hilde R. Pedersen, Stuart Gilson, Rigmor Baraas, Andreas Maier

Abstract:Accurate detection and segmentation of cone cells in the retina are essential for diagnosing and managing retinal diseases. In this study, we used advanced imaging techniques, including confocal and non-confocal split detector images from adaptive optics scanning light ophthalmoscopy (AOSLO), to analyze photoreceptors for improved accuracy. Precise segmentation is crucial for understanding each cone cell's shape, area, and distribution. It helps to estimate the surrounding areas occupied by rods, which allows the calculation of the density of cone photoreceptors in the area of interest. In turn, density is critical for evaluating overall retinal health and functionality. We explored two U-Net-based segmentation models: StarDist for confocal and Cellpose for calculated modalities. Analyzing cone cells in images from two modalities and achieving consistent results demonstrates the study's reliability and potential for clinical application.

Via

Access Paper or Ask Questions

DRACO: Differentiable Reconstruction for Arbitrary CBCT Orbits

Oct 18, 2024

Chengze Ye, Linda-Sophie Schneider, Yipeng Sun, Mareike Thies, Siyuan Mei, Andreas Maier

Figure 1 for DRACO: Differentiable Reconstruction for Arbitrary CBCT Orbits

Figure 2 for DRACO: Differentiable Reconstruction for Arbitrary CBCT Orbits

Figure 3 for DRACO: Differentiable Reconstruction for Arbitrary CBCT Orbits

Figure 4 for DRACO: Differentiable Reconstruction for Arbitrary CBCT Orbits

Abstract:This paper introduces a novel method for reconstructing cone beam computed tomography (CBCT) images for arbitrary orbits using a differentiable shift-variant filtered backprojection (FBP) neural network. Traditional CBCT reconstruction methods for arbitrary orbits, like iterative reconstruction algorithms, are computationally expensive and memory-intensive. The proposed method addresses these challenges by employing a shift-variant FBP algorithm optimized for arbitrary trajectories through a deep learning approach that adapts to a specific orbit geometry. This approach overcomes the limitations of existing techniques by integrating known operators into the learning model, minimizing the number of parameters, and improving the interpretability of the model. The proposed method is a significant advancement in interventional medical imaging, particularly for robotic C-arm CT systems, enabling faster and more accurate CBCT reconstructions with customized orbits. Especially this method can also be used for the analytical reconstruction of non-continuous orbits like circular plus arc. The experimental results demonstrate that the proposed method significantly accelerates the reconstruction process compared to conventional iterative algorithms. It achieves comparable or superior image quality, as evidenced by metrics such as the mean squared error (MSE), the peak signal-to-noise ratio (PSNR), and the structural similarity index measure (SSIM). The validation experiments show that the method can handle data from different trajectories, demonstrating its flexibility and robustness across different scan geometries. Our method demonstrates a significant improvement, particularly for the sinusoidal trajectory, achieving a 38.6% reduction in MSE, a 7.7% increase in PSNR, and a 5.0% improvement in SSIM. Furthermore, the computation time for reconstruction was reduced by more than 97%.

Via

Access Paper or Ask Questions

Differential privacy for protecting patient data in speech disorder detection using deep learning

Sep 27, 2024

Soroosh Tayebi Arasteh, Mahshad Lotfinia, Paula Andrea Perez-Toro, Tomas Arias-Vergara, Juan Rafael Orozco-Arroyave, Maria Schuster, Andreas Maier, Seung Hee Yang

Figure 1 for Differential privacy for protecting patient data in speech disorder detection using deep learning

Figure 2 for Differential privacy for protecting patient data in speech disorder detection using deep learning

Figure 3 for Differential privacy for protecting patient data in speech disorder detection using deep learning

Figure 4 for Differential privacy for protecting patient data in speech disorder detection using deep learning

Abstract:Speech pathology has impacts on communication abilities and quality of life. While deep learning-based models have shown potential in diagnosing these disorders, the use of sensitive data raises critical privacy concerns. Although differential privacy (DP) has been explored in the medical imaging domain, its application in pathological speech analysis remains largely unexplored despite the equally critical privacy concerns. This study is the first to investigate DP's impact on pathological speech data, focusing on the trade-offs between privacy, diagnostic accuracy, and fairness. Using a large, real-world dataset of 200 hours of recordings from 2,839 German-speaking participants, we observed a maximum accuracy reduction of 3.85% when training with DP with a privacy budget, denoted by {\epsilon}, of 7.51. To generalize our findings, we validated our approach on a smaller dataset of Spanish-speaking Parkinson's disease patients, demonstrating that careful pretraining on large-scale task-specific datasets can maintain or even improve model accuracy under DP constraints. We also conducted a comprehensive fairness analysis, revealing that reasonable privacy levels (2<{\epsilon}<10) do not introduce significant gender bias, though age-related disparities may require further attention. Our results suggest that DP can effectively balance privacy and utility in speech disorder detection, but also highlight the unique challenges in the speech domain, particularly regarding the privacy-fairness trade-off. This provides a foundation for future work to refine DP methodologies and address fairness across diverse patient groups in real-world deployments.

Via

Access Paper or Ask Questions