Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andreas Maier

Pattern Recognition Lab, FAU Erlangen-Nürnberg, Germany

Task-Specific Data Preparation for Deep Learning to Reconstruct Structures of Interest from Severely Truncated CBCT Data

Sep 13, 2024

Yixing Huang, Fuxin Fan, Ahmed Gomaa, Andreas Maier, Rainer Fietkau, Christoph Bert, Florian Putz

Figure 1 for Task-Specific Data Preparation for Deep Learning to Reconstruct Structures of Interest from Severely Truncated CBCT Data

Figure 2 for Task-Specific Data Preparation for Deep Learning to Reconstruct Structures of Interest from Severely Truncated CBCT Data

Figure 3 for Task-Specific Data Preparation for Deep Learning to Reconstruct Structures of Interest from Severely Truncated CBCT Data

Figure 4 for Task-Specific Data Preparation for Deep Learning to Reconstruct Structures of Interest from Severely Truncated CBCT Data

Abstract:Cone-beam computed tomography (CBCT) is widely used in interventional surgeries and radiation oncology. Due to the limited size of flat-panel detectors, anatomical structures might be missing outside the limited field-of-view (FOV), which restricts the clinical applications of CBCT systems. Recently, deep learning methods have been proposed to extend the FOV for multi-slice CT systems. However, in mobile CBCT system with a smaller FOV size, projection data is severely truncated and it is challenging for a network to restore all missing structures outside the FOV. In some applications, only certain structures outside the FOV are of interest, e.g., ribs in needle path planning for liver/lung cancer diagnosis. Therefore, a task-specific data preparation method is proposed in this work, which automatically let the network focus on structures of interest instead of all the structures. Our preliminary experiment shows that Pix2pixGAN with a conventional training has the risk to reconstruct false positive and false negative rib structures from severely truncated CBCT data, whereas Pix2pixGAN with the proposed task-specific training can reconstruct all the ribs reliably. The proposed method is promising to empower CBCT with more clinical applications.

* Published in the CT-Meeting 2024 proceeding. arXiv admin note: text overlap with arXiv:2108.13844

Via

Access Paper or Ask Questions

Enhancing Cross-Modality Synthesis: Subvolume Merging for MRI-to-CT Conversion

Sep 09, 2024

Fuxin Fan, Jingna Qiu, Yixing Huang, Andreas Maier

Figure 1 for Enhancing Cross-Modality Synthesis: Subvolume Merging for MRI-to-CT Conversion

Figure 2 for Enhancing Cross-Modality Synthesis: Subvolume Merging for MRI-to-CT Conversion

Figure 3 for Enhancing Cross-Modality Synthesis: Subvolume Merging for MRI-to-CT Conversion

Figure 4 for Enhancing Cross-Modality Synthesis: Subvolume Merging for MRI-to-CT Conversion

Abstract:Providing more precise tissue attenuation information, synthetic computed tomography (sCT) generated from magnetic resonance imaging (MRI) contributes to improved radiation therapy treatment planning. In our study, we employ the advanced SwinUNETR framework for synthesizing CT from MRI images. Additionally, we introduce a three-dimensional subvolume merging technique in the prediction process. By selecting an optimal overlap percentage for adjacent subvolumes, stitching artifacts are effectively mitigated, leading to a decrease in the mean absolute error (MAE) between sCT and the labels from 52.65 HU to 47.75 HU. Furthermore, implementing a weight function with a gamma value of 0.9 results in the lowest MAE within the same overlap area. By setting the overlap percentage between 50% and 70%, we achieve a balance between image quality and computational efficiency.

Via

Access Paper or Ask Questions

Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models

Sep 01, 2024

Martin Mayr, Marcel Dreier, Florian Kordon, Mathias Seuret, Jochen Zöllner, Fei Wu, Andreas Maier, Vincent Christlein

Figure 1 for Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models

Figure 2 for Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models

Figure 3 for Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models

Figure 4 for Zero-Shot Paragraph-level Handwriting Imitation with Latent Diffusion Models

Abstract:The imitation of cursive handwriting is mainly limited to generating handwritten words or lines. Multiple synthetic outputs must be stitched together to create paragraphs or whole pages, whereby consistency and layout information are lost. To close this gap, we propose a method for imitating handwriting at the paragraph level that also works for unseen writing styles. Therefore, we introduce a modified latent diffusion model that enriches the encoder-decoder mechanism with specialized loss functions that explicitly preserve the style and content. We enhance the attention mechanism of the diffusion model with adaptive 2D positional encoding and the conditioning mechanism to work with two modalities simultaneously: a style image and the target text. This significantly improves the realism of the generated handwriting. Our approach sets a new benchmark in our comprehensive evaluation. It outperforms all existing imitation methods at both line and paragraph levels, considering combined style and content preservation.

Via

Access Paper or Ask Questions

Generalist Segmentation Algorithm for Photoreceptors Analysis in Adaptive Optics Imaging

Aug 27, 2024

Mikhail Kulyabin, Aline Sindel, Hilde Pedersen, Stuart Gilson, Rigmor Baraas, Andreas Maier

Abstract:Analyzing the cone photoreceptor pattern in images obtained from the living human retina using quantitative methods can be crucial for the early detection and management of various eye conditions. Confocal adaptive optics scanning light ophthalmoscope (AOSLO) imaging enables visualization of the cones from reflections of waveguiding cone photoreceptors. While there have been significant improvements in automated algorithms for segmenting cones in confocal AOSLO images, the process of labelling data remains labor-intensive and manual. This paper introduces a method based on deep learning (DL) for detecting and segmenting cones in AOSLO images. The models were trained on a semi-automatically labelled dataset of 20 AOSLO batches of images of 18 participants for 0$^{\circ}$, 1$^{\circ}$, and 2$^{\circ}$ from the foveal center. F1 scores were 0.968, 0.958, and 0.954 for 0$^{\circ}$, 1$^{\circ}$, and 2$^{\circ}$, respectively, which is better than previously reported DL approaches. Our method minimizes the need for labelled data by only necessitating a fraction of labelled cones, which is especially beneficial in the field of ophthalmology, where labelled data can often be limited.

Via

Access Paper or Ask Questions

Fine-Tuning a Local LLaMA-3 Large Language Model for Automated Privacy-Preserving Physician Letter Generation in Radiation Oncology

Aug 20, 2024

Yihao Hou, Christoph Bert, Ahmed Gomaa, Godehard Lahmer, Daniel Hoefler, Thomas Weissmann, Raphaela Voigt, Philipp Schubert, Charlotte Schmitter, Alina Depardon(+5 more)

Figure 1 for Fine-Tuning a Local LLaMA-3 Large Language Model for Automated Privacy-Preserving Physician Letter Generation in Radiation Oncology

Figure 2 for Fine-Tuning a Local LLaMA-3 Large Language Model for Automated Privacy-Preserving Physician Letter Generation in Radiation Oncology

Figure 3 for Fine-Tuning a Local LLaMA-3 Large Language Model for Automated Privacy-Preserving Physician Letter Generation in Radiation Oncology

Figure 4 for Fine-Tuning a Local LLaMA-3 Large Language Model for Automated Privacy-Preserving Physician Letter Generation in Radiation Oncology

Abstract:Generating physician letters is a time-consuming task in daily clinical practice. This study investigates local fine-tuning of large language models (LLMs), specifically LLaMA models, for physician letter generation in a privacy-preserving manner within the field of radiation oncology. Our findings demonstrate that base LLaMA models, without fine-tuning, are inadequate for effectively generating physician letters. The QLoRA algorithm provides an efficient method for local intra-institutional fine-tuning of LLMs with limited computational resources (i.e., a single 48 GB GPU workstation within the hospital). The fine-tuned LLM successfully learns radiation oncology-specific information and generates physician letters in an institution-specific style. ROUGE scores of the generated summary reports highlight the superiority of the 8B LLaMA-3 model over the 13B LLaMA-2 model. Further multidimensional physician evaluations of 10 cases reveal that, although the fine-tuned LLaMA-3 model has limited capacity to generate content beyond the provided input data, it successfully generates salutations, diagnoses and treatment histories, recommendations for further treatment, and planned schedules. Overall, clinical benefit was rated highly by the clinical experts (average score of 3.44 on a 4-point scale). With careful physician review and correction, automated LLM-based physician letter generation has significant practical value.

Via

Access Paper or Ask Questions

UnPaSt: unsupervised patient stratification by differentially expressed biclusters in omics data

Jul 31, 2024

Michael Hartung, Andreas Maier, Fernando Delgado-Chaves, Yuliya Burankova, Olga I. Isaeva, Fábio Malta de Sá Patroni, Daniel He, Casey Shannon, Katharina Kaufmann, Jens Lohmann(+12 more)

Abstract:Most complex diseases, including cancer and non-malignant diseases like asthma, have distinct molecular subtypes that require distinct clinical approaches. However, existing computational patient stratification methods have been benchmarked almost exclusively on cancer omics data and only perform well when mutually exclusive subtypes can be characterized by many biomarkers. Here, we contribute with a massive evaluation attempt, quantitatively exploring the power of 22 unsupervised patient stratification methods using both, simulated and real transcriptome data. From this experience, we developed UnPaSt (https://apps.cosy.bio/unpast/) optimizing unsupervised patient stratification, working even with only a limited number of subtype-predictive biomarkers. We evaluated all 23 methods on real-world breast cancer and asthma transcriptomics data. Although many methods reliably detected major breast cancer subtypes, only few identified Th2-high asthma, and UnPaSt significantly outperformed its closest competitors in both test datasets. Essentially, we showed that UnPaSt can detect many biologically insightful and reproducible patterns in omic datasets.

* The first two authors listed are joint first authors. The last two authors listed are joint last authors

Via

Access Paper or Ask Questions

Novel Artistic Scene-Centric Datasets for Effective Transfer Learning in Fragrant Spaces

Jul 16, 2024

Shumei Liu, Haiting Huang, Mathias Zinnen, Andreas Maier, Vincent Christlein

Figure 1 for Novel Artistic Scene-Centric Datasets for Effective Transfer Learning in Fragrant Spaces

Figure 2 for Novel Artistic Scene-Centric Datasets for Effective Transfer Learning in Fragrant Spaces

Figure 3 for Novel Artistic Scene-Centric Datasets for Effective Transfer Learning in Fragrant Spaces

Figure 4 for Novel Artistic Scene-Centric Datasets for Effective Transfer Learning in Fragrant Spaces

Abstract:Olfaction, often overlooked in cultural heritage studies, holds profound significance in shaping human experiences and identities. Examining historical depictions of olfactory scenes can offer valuable insights into the role of smells in history. We show that a transfer-learning approach using weakly labeled training data can remarkably improve the classification of fragrant spaces and, more generally, artistic scene depictions. We fine-tune Places365-pre-trained models by querying two cultural heritage data sources and using the search terms as supervision signal. The models are evaluated on two manually corrected test splits. This work lays a foundation for further exploration of fragrant spaces recognition and artistic scene classification. All images and labels are released as the ArtPlaces dataset at https://zenodo.org/doi/10.5281/zenodo.11584328.

Via

Access Paper or Ask Questions

Synthetic Electroretinogram Signal Generation Using Conditional Generative Adversarial Network for Enhancing Classification of Autism Spectrum Disorder

Jul 11, 2024

Mikhail Kulyabin, Paul A. Constable, Aleksei Zhdanov, Irene O. Lee, David H. Skuse, Dorothy A. Thompson, Andreas Maier

Figure 1 for Synthetic Electroretinogram Signal Generation Using Conditional Generative Adversarial Network for Enhancing Classification of Autism Spectrum Disorder

Figure 2 for Synthetic Electroretinogram Signal Generation Using Conditional Generative Adversarial Network for Enhancing Classification of Autism Spectrum Disorder

Figure 3 for Synthetic Electroretinogram Signal Generation Using Conditional Generative Adversarial Network for Enhancing Classification of Autism Spectrum Disorder

Figure 4 for Synthetic Electroretinogram Signal Generation Using Conditional Generative Adversarial Network for Enhancing Classification of Autism Spectrum Disorder

Abstract:The electroretinogram (ERG) is a clinical test that records the retina's electrical response to light. The ERG is a promising way to study different neurodevelopmental and neurodegenerative disorders, including autism spectrum disorder (ASD) - a neurodevelopmental condition that impacts language, communication, and reciprocal social interactions. However, in heterogeneous populations, such as ASD, where the ability to collect large datasets is limited, the application of artificial intelligence (AI) is complicated. Synthetic ERG signals generated from real ERG recordings carry similar information as natural ERGs and, therefore, could be used as an extension for natural data to increase datasets so that AI applications can be fully utilized. As proof of principle, this study presents a Generative Adversarial Network capable of generating synthetic ERG signals of children with ASD and typically developing control individuals. We applied a Time Series Transformer and Visual Transformer with Continuous Wavelet Transform to enhance classification results on the extended synthetic signals dataset. This approach may support classification models in related psychiatric conditions where the ERG may help classify disorders.

Via

Access Paper or Ask Questions

Smell and Emotion: Recognising emotions in smell-related artworks

Jul 05, 2024

Vishal Patoliya, Mathias Zinnen, Andreas Maier, Vincent Christlein

Figure 1 for Smell and Emotion: Recognising emotions in smell-related artworks

Figure 2 for Smell and Emotion: Recognising emotions in smell-related artworks

Figure 3 for Smell and Emotion: Recognising emotions in smell-related artworks

Figure 4 for Smell and Emotion: Recognising emotions in smell-related artworks

Abstract:Emotions and smell are underrepresented in digital art history. In this exploratory work, we show that recognising emotions from smell-related artworks is technically feasible but has room for improvement. Using style transfer and hyperparameter optimization we achieve a minor performance boost and open up the field for future extensions.

* 5 pages, 3 figures

Via

Access Paper or Ask Questions

Speaker- and Text-Independent Estimation of Articulatory Movements and Phoneme Alignments from Speech

Jul 03, 2024

Tobias Weise, Philipp Klumpp, Kubilay Can Demir, Paula Andrea Pérez-Toro, Maria Schuster, Elmar Noeth, Bjoern Heismann, Andreas Maier, Seung Hee Yang

Abstract:This paper introduces a novel combination of two tasks, previously treated separately: acoustic-to-articulatory speech inversion (AAI) and phoneme-to-articulatory (PTA) motion estimation. We refer to this joint task as acoustic phoneme-to-articulatory speech inversion (APTAI) and explore two different approaches, both working speaker- and text-independently during inference. We use a multi-task learning setup, with the end-to-end goal of taking raw speech as input and estimating the corresponding articulatory movements, phoneme sequence, and phoneme alignment. While both proposed approaches share these same requirements, they differ in their way of achieving phoneme-related predictions: one is based on frame classification, the other on a two-staged training procedure and forced alignment. We reach competitive performance of 0.73 mean correlation for the AAI task and achieve up to approximately 87% frame overlap compared to a state-of-the-art text-dependent phoneme force aligner.

* to be published in Interspeech 2024 proceedings

Via

Access Paper or Ask Questions