Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andreas Maier

Pattern Recognition Lab, FAU Erlangen-Nürnberg, Germany

A Vessel-Segmentation-Based CycleGAN for Unpaired Multi-modal Retinal Image Synthesis

Jun 05, 2023

Aline Sindel, Andreas Maier, Vincent Christlein

Abstract:Unpaired image-to-image translation of retinal images can efficiently increase the training dataset for deep-learning-based multi-modal retinal registration methods. Our method integrates a vessel segmentation network into the image-to-image translation task by extending the CycleGAN framework. The segmentation network is inserted prior to a UNet vision transformer generator network and serves as a shared representation between both domains. We reformulate the original identity loss to learn the direct mapping between the vessel segmentation and the real image. Additionally, we add a segmentation loss term to ensure shared vessel locations between fake and real images. In the experiments, our method shows a visually realistic look and preserves the vessel structures, which is a prerequisite for generating multi-modal training data for image registration.

* BVM 2023
* Accepted to BVM 2023

Via

Access Paper or Ask Questions

Federated learning for secure development of AI models for Parkinson's disease detection using speech from different languages

May 18, 2023

Soroosh Tayebi Arasteh, Cristian David Rios-Urrego, Elmar Noeth, Andreas Maier, Seung Hee Yang, Jan Rusz, Juan Rafael Orozco-Arroyave

Figure 1 for Federated learning for secure development of AI models for Parkinson's disease detection using speech from different languages

Figure 2 for Federated learning for secure development of AI models for Parkinson's disease detection using speech from different languages

Figure 3 for Federated learning for secure development of AI models for Parkinson's disease detection using speech from different languages

Figure 4 for Federated learning for secure development of AI models for Parkinson's disease detection using speech from different languages

Abstract:Parkinson's disease (PD) is a neurological disorder impacting a person's speech. Among automatic PD assessment methods, deep learning models have gained particular interest. Recently, the community has explored cross-pathology and cross-language models which can improve diagnostic accuracy even further. However, strict patient data privacy regulations largely prevent institutions from sharing patient speech data with each other. In this paper, we employ federated learning (FL) for PD detection using speech signals from 3 real-world language corpora of German, Spanish, and Czech, each from a separate institution. Our results indicate that the FL model outperforms all the local models in terms of diagnostic accuracy, while not performing very differently from the model based on centrally combined training sets, with the advantage of not requiring any data sharing among collaborators. This will simplify inter-institutional collaborations, resulting in enhancement of patient outcomes.

* Accepted for INTERSPEECH 2023

Via

Access Paper or Ask Questions

Deep Multi-Frame Filtering for Hearing Aids

May 14, 2023

Hendrik Schröter, Tobias Rosenkranz, Alberto N. Escalante-B., Andreas Maier

Abstract:Multi-frame algorithms for single-channel speech enhancement are able to take advantage from short-time correlations within the speech signal. Deep filtering (DF) recently demonstrated its capabilities for low-latency scenarios like hearing aids with its complex multi-frame (MF) filter. Alternatively, the complex filter can be estimated via an MF minimum variance distortionless response (MVDR), or MF Wiener filter (WF). Previous studies have shown that incorporating algorithm domain knowledge using an MVDR filter might be beneficial compared to the direct filter estimation via DF. In this work, we compare the usage of various multi-frame filters such as DF, MF-MVDR, or MF-WF for HAs. We assess different covariance estimation methods for both MF-MVDR and MF-WF and objectively demonstrate an improved performance compared to direct DF estimation, significantly outperforming related work while improving the runtime performance.

* Submitted to Interspeech 2023

Via

Access Paper or Ask Questions

DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement

May 14, 2023

Hendrik Schröter, Tobias Rosenkranz, Alberto N. Escalante-B., Andreas Maier

Figure 1 for DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement

Figure 2 for DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement

Figure 3 for DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement

Abstract:Multi-frame algorithms for single-channel speech enhancement are able to take advantage from short-time correlations within the speech signal. Deep Filtering (DF) was proposed to directly estimate a complex filter in frequency domain to take advantage of these correlations. In this work, we present a real-time speech enhancement demo using DeepFilterNet. DeepFilterNet's efficiency is enabled by exploiting domain knowledge of speech production and psychoacoustic perception. Our model is able to match state-of-the-art speech enhancement benchmarks while achieving a real-time-factor of 0.19 on a single threaded notebook CPU. The framework as well as pretrained weights have been published under an open source license.

* Accepted as show and tell demo to interspeech 2023

Via

Access Paper or Ask Questions

Joint MR sequence optimization beats pure neural network approaches for spin-echo MRI super-resolution

May 12, 2023

Hoai Nam Dang, Vladimir Golkov, Thomas Wimmer, Daniel Cremers, Andreas Maier, Moritz Zaiss

Figure 1 for Joint MR sequence optimization beats pure neural network approaches for spin-echo MRI super-resolution

Figure 2 for Joint MR sequence optimization beats pure neural network approaches for spin-echo MRI super-resolution

Figure 3 for Joint MR sequence optimization beats pure neural network approaches for spin-echo MRI super-resolution

Figure 4 for Joint MR sequence optimization beats pure neural network approaches for spin-echo MRI super-resolution

Abstract:Current MRI super-resolution (SR) methods only use existing contrasts acquired from typical clinical sequences as input for the neural network (NN). In turbo spin echo sequences (TSE) the sequence parameters can have a strong influence on the actual resolution of the acquired image and have consequently a considera-ble impact on the performance of the NN. We propose a known-operator learning approach to perform an end-to-end optimization of MR sequence and neural net-work parameters for SR-TSE. This MR-physics-informed training procedure jointly optimizes the radiofrequency pulse train of a proton density- (PD-) and T2-weighted TSE and a subsequently applied convolutional neural network to predict the corresponding PDw and T2w super-resolution TSE images. The found radiofrequency pulse train designs generate an optimal signal for the NN to perform the SR task. Our method generalizes from the simulation-based optimi-zation to in vivo measurements and the acquired physics-informed SR images show higher correlation with a time-consuming segmented high-resolution TSE sequence compared to a pure network training approach.

* 13 pages, 4 figures, 3 tables, submitted to MICCAI 2023 for review

Via

Access Paper or Ask Questions

Building a Non-native Speech Corpus Featuring Chinese-English Bilingual Children: Compilation and Rationale

Apr 30, 2023

Hiuchung Hung, Andreas Maier, Thorsten Piske

Abstract:This paper introduces a non-native speech corpus consisting of narratives from fifty 5- to 6-year-old Chinese-English children. Transcripts totaling 6.5 hours of children taking a narrative comprehension test in English (L2) are presented, along with human-rated scores and annotations of grammatical and pronunciation errors. The children also completed the parallel MAIN tests in Chinese (L1) for reference purposes. For all tests we recorded audio and video with our innovative self-developed remote collection methods. The video recordings serve to mitigate the challenge of low intelligibility in L2 narratives produced by young children during the transcription process. This corpus offers valuable resources for second language teaching and has the potential to enhance the overall performance of automatic speech recognition (ASR).

Via

Access Paper or Ask Questions

Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training Exam (TXIT): Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation Oncology

Apr 24, 2023

Yixing Huang, Ahmed Gomaa, Thomas Weissmann, Johanna Grigo, Hassen Ben Tkhayat, Benjamin Frey, Udo S. Gaipl, Luitpold V. Distel, Andreas Maier, Rainer Fietkau(+2 more)

Figure 1 for Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training Exam (TXIT): Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation Oncology

Figure 2 for Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training Exam (TXIT): Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation Oncology

Figure 3 for Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training Exam (TXIT): Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation Oncology

Figure 4 for Benchmarking ChatGPT-4 on ACR Radiation Oncology In-Training Exam (TXIT): Potentials and Challenges for AI-Assisted Medical Education and Decision Making in Radiation Oncology

Abstract:The potential of large language models in medicine for education and decision making purposes has been demonstrated as they achieve decent scores on medical exams such as the United States Medical Licensing Exam (USMLE) and the MedQA exam. In this work, we evaluate the performance of ChatGPT-3.5 and ChatGPT-4 in the specialized field of radiation oncology using the 38th American College of Radiology (ACR) radiation oncology in-training exam (TXIT). ChatGPT-3.5 and ChatGPT-4 have achieved the scores of 63.65% and 74.57%, respectively, highlighting the advantage of the latest ChatGPT-4 model. Based on the TXIT exam, ChatGPT-4's strong and weak areas in radiation oncology are identified to some extent. Specifically, ChatGPT-4 demonstrates good knowledge of statistics, CNS & eye, pediatrics, biology, and physics but has limitations in bone & soft tissue and gynecology, as per the ACR knowledge domain. Regarding clinical care paths, ChatGPT-4 performs well in diagnosis, prognosis, and toxicity but lacks proficiency in topics related to brachytherapy and dosimetry, as well as in-depth questions from clinical trials. While ChatGPT-4 is not yet suitable for clinical decision making in radiation oncology, it has the potential to assist in medical education for the general public and cancer patients. With further fine-tuning, it could assist radiation oncologists in recommending treatment decisions for challenging clinical cases based on the latest guidelines and the existing gray zone database.

Via

Access Paper or Ask Questions

Scale-Equivariant Deep Learning for 3D Data

Apr 12, 2023

Thomas Wimmer, Vladimir Golkov, Hoai Nam Dang, Moritz Zaiss, Andreas Maier, Daniel Cremers

Figure 1 for Scale-Equivariant Deep Learning for 3D Data

Figure 2 for Scale-Equivariant Deep Learning for 3D Data

Figure 3 for Scale-Equivariant Deep Learning for 3D Data

Figure 4 for Scale-Equivariant Deep Learning for 3D Data

Abstract:The ability of convolutional neural networks (CNNs) to recognize objects regardless of their position in the image is due to the translation-equivariance of the convolutional operation. Group-equivariant CNNs transfer this equivariance to other transformations of the input. Dealing appropriately with objects and object parts of different scale is challenging, and scale can vary for multiple reasons such as the underlying object size or the resolution of the imaging modality. In this paper, we propose a scale-equivariant convolutional network layer for three-dimensional data that guarantees scale-equivariance in 3D CNNs. Scale-equivariance lifts the burden of having to learn each possible scale separately, allowing the neural network to focus on higher-level learning goals, which leads to better results and better data-efficiency. We provide an overview of the theoretical foundations and scientific work on scale-equivariant neural networks in the two-dimensional domain. We then transfer the concepts from 2D to the three-dimensional space and create a scale-equivariant convolutional layer for 3D data. Using the proposed scale-equivariant layer, we create a scale-equivariant U-Net for medical image segmentation and compare it with a non-scale-equivariant baseline method. Our experiments demonstrate the effectiveness of the proposed method in achieving scale-equivariance for 3D medical image analysis. We publish our code at https://github.com/wimmerth/scale-equivariant-3d-convnet for further research and application.

* 12 pages, 4 figures

Via

Access Paper or Ask Questions

Unsupervised detection of small hyperreflective features in ultrahigh resolution optical coherence tomography

Mar 26, 2023

Marcel Reimann, Jungeun Won, Hiroyuki Takahashi, Antonio Yaghy, Yunchan Hwang, Stefan Ploner, Junhong Lin, Jessica Girgis, Kenneth Lam, Siyu Chen(+3 more)

Abstract:Recent advances in optical coherence tomography such as the development of high speed ultrahigh resolution scanners and corresponding signal processing techniques may reveal new potential biomarkers in retinal diseases. Newly visible features are, for example, small hyperreflective specks in age-related macular degeneration. Identifying these new markers is crucial to investigate potential association with disease progression and treatment outcomes. Therefore, it is necessary to reliably detect these features in 3D volumetric scans. Because manual labeling of entire volumes is infeasible a need for automatic detection arises. Labeled datasets are often not publicly available and there are usually large variations in scan protocols and scanner types. Thus, this work focuses on an unsupervised approach that is based on local peak-detection and random walker segmentation to detect small features on each B-scan of the volume.

* Accepted as poster at BVM workshop 2023 (https://www.bvm-workshop.org/). The arXiv version provides full quality figures. 6 pages content (2 figures)

Via

Access Paper or Ask Questions

Task-based Generation of Optimized Projection Sets using Differentiable Ranking

Mar 21, 2023

Linda-Sophie Schneider, Mareike Thies, Christopher Syben, Richard Schielein, Mathias Unberath, Andreas Maier

Figure 1 for Task-based Generation of Optimized Projection Sets using Differentiable Ranking

Figure 2 for Task-based Generation of Optimized Projection Sets using Differentiable Ranking

Figure 3 for Task-based Generation of Optimized Projection Sets using Differentiable Ranking

Abstract:We present a method for selecting valuable projections in computed tomography (CT) scans to enhance image reconstruction and diagnosis. The approach integrates two important factors, projection-based detectability and data completeness, into a single feed-forward neural network. The network evaluates the value of projections, processes them through a differentiable ranking function and makes the final selection using a straight-through estimator. Data completeness is ensured through the label provided during training. The approach eliminates the need for heuristically enforcing data completeness, which may exclude valuable projections. The method is evaluated on simulated data in a non-destructive testing scenario, where the aim is to maximize the reconstruction quality within a specified region of interest. We achieve comparable results to previous methods, laying the foundation for using reconstruction-based loss functions to learn the selection of projections.

Via

Access Paper or Ask Questions