Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nikolas Ebert

PointTransformerX:Portable and Efficient 3D Point Cloud Processing without Sparse Algorithms

Apr 27, 2026

Laurenz Reichardt, Nikolas Ebert, Oliver Wasenmüller

Abstract:3D point cloud perception remains tightly coupled to custom CUDA operators for spatial operations, limiting portability and efficiency on non-NVIDIA, AMD, and embedded hardware. We introduce PointTransformerX (PTX), a fully PyTorch-native vision transformer backbone for 3D point clouds, removing all custom CUDA operators and external libraries while retaining competitive accuracy. PTX introduces 3D-GS-RoPE, a rotary positional embedding that encodes 3D spatial relationships directly in self-attention without neighborhood construction, and further replaces sparse convolutional patch embedding with a linear projection. PTX explores inference-time scaling of attention windows to improve accuracy without retraining. With a redesigned feed-forward network, PTX achieves 98.7\% of PointTransformer V3's accuracy on ScanNet with 79.2\% fewer parameters and executing 1.6\times faster while requiring just 253 MB memory. PTX runs natively on NVIDIA GPUs, AMD GPUs (ROCm), and CPUs, providing an efficient and portable foundation for point cloud perception.

* This paper has been accepted at IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2026

Via

Access Paper or Ask Questions

IonMorphNet: Generalizable Learning of Ion Image Morphologies for Peak Picking in Mass Spectrometry Imaging

Apr 21, 2026

Philipp Weigand, Niels Nawrot, Nikolas Ebert, Carsten Hopf, Oliver Wasenmüller

Abstract:Peak picking is a fundamental preprocessing step in Mass Spectrometry Imaging (MSI), where each sample is represented by hundreds to thousands of ion images. Existing approaches require careful dataset-specific hyperparameter tuning, and often fail to generalize across acquisition protocols. We introduce IonMorphNet, a spatial-structure-aware representation model for ion images that enables fully data-driven peak picking without any task-specific supervision. We curate 53 publicly available MSI datasets and define six structural classes capturing representative spatial patterns in ion images to train standard image backbones for structural pattern classification. Once trained, IonMorphNet can assess ion images and perform peak picking without additional hyperparameter tuning. Using a ConvNeXt V2-Tiny backbone, our approach improves peak picking performance by +7 % mSCF1 compared to state-of-the-art methods across multiple datasets. Beyond peak picking, we demonstrate that spatially informed channel reduction enables a 3D CNN for patch-based tumor classification in MSI. This approach matches or exceeds pixel-wise spectral classifiers by up to +7.3 % Balanced Accuracy on three tumor classification tasks, indicating meaningful ion image selection. The source code and model weights are available at https://github.com/CeMOS-IS/IonMorphNet.

* This paper has been accepted at IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2026

Via

Access Paper or Ask Questions

Spatial self-supervised Peak Learning and correlation-based Evaluation of peak picking in Mass Spectrometry Imaging

Mar 11, 2026

Philipp Weigand, Nikolas Ebert, Shad A. Mohammed, Denis Abu Sammour, Carsten Hopf, Oliver Wasenmüller

Abstract:Mass spectrometry imaging (MSI) enables label-free visualization of molecular distributions across tissue samples but generates large and complex datasets that require effective peak picking to reduce data size while preserving meaningful biological information. Existing peak picking approaches perform inconsistently across heterogeneous datasets, and their evaluation is often limited to synthetic data or manually selected ion images that do not fully represent real-world challenges in MSI. To address these limitations, we propose an autoencoder-based spatial self-supervised peak learning neural network that selects spatially structured peaks by learning an attention mask leveraging both spatial and spectral information. We further introduce an evaluation procedure based on expert-annotated segmentation masks, allowing a more representative and spatially grounded assessment of peak picking performance. We evaluate our approach on four diverse public MSI datasets using our proposed evaluation procedure. Our approach consistently outperforms state-of-the-art peak picking methods by selecting spatially structured peaks, thus demonstrating its efficacy. These results highlight the value of our spatial self-supervised network in comparison to contemporary state-of-the-art methods. The evaluation procedure can be readily applied to new MSI datasets, thereby providing a consistent and robust framework for the comparison of spatially structured peak picking methods across different datasets.

Via

Access Paper or Ask Questions

D-PLS: Decoupled Semantic Segmentation for 4D-Panoptic-LiDAR-Segmentation

Jan 27, 2025

Maik Steinhauser, Laurenz Reichardt, Nikolas Ebert, Oliver Wasenmüller

Figure 1 for D-PLS: Decoupled Semantic Segmentation for 4D-Panoptic-LiDAR-Segmentation

Figure 2 for D-PLS: Decoupled Semantic Segmentation for 4D-Panoptic-LiDAR-Segmentation

Figure 3 for D-PLS: Decoupled Semantic Segmentation for 4D-Panoptic-LiDAR-Segmentation

Figure 4 for D-PLS: Decoupled Semantic Segmentation for 4D-Panoptic-LiDAR-Segmentation

Abstract:This paper introduces a novel approach to 4D Panoptic LiDAR Segmentation that decouples semantic and instance segmentation, leveraging single-scan semantic predictions as prior information for instance segmentation. Our method D-PLS first performs single-scan semantic segmentation and aggregates the results over time, using them to guide instance segmentation. The modular design of D-PLS allows for seamless integration on top of any semantic segmentation architecture, without requiring architectural changes or retraining. We evaluate our approach on the SemanticKITTI dataset, where it demonstrates significant improvements over the baseline in both classification and association tasks, as measured by the LiDAR Segmentation and Tracking Quality (LSTQ) metric. Furthermore, we show that our decoupled architecture not only enhances instance prediction but also surpasses the baseline due to advancements in single-scan semantic segmentation.

Via

Access Paper or Ask Questions

Classifier Ensemble for Efficient Uncertainty Calibration of Deep Neural Networks for Image Classification

Jan 17, 2025

Michael Schulze, Nikolas Ebert, Laurenz Reichardt, Oliver Wasenmüller

Figure 1 for Classifier Ensemble for Efficient Uncertainty Calibration of Deep Neural Networks for Image Classification

Figure 2 for Classifier Ensemble for Efficient Uncertainty Calibration of Deep Neural Networks for Image Classification

Figure 3 for Classifier Ensemble for Efficient Uncertainty Calibration of Deep Neural Networks for Image Classification

Figure 4 for Classifier Ensemble for Efficient Uncertainty Calibration of Deep Neural Networks for Image Classification

Abstract:This paper investigates novel classifier ensemble techniques for uncertainty calibration applied to various deep neural networks for image classification. We evaluate both accuracy and calibration metrics, focusing on Expected Calibration Error (ECE) and Maximum Calibration Error (MCE). Our work compares different methods for building simple yet efficient classifier ensembles, including majority voting and several metamodel-based approaches. Our evaluation reveals that while state-of-the-art deep neural networks for image classification achieve high accuracy on standard datasets, they frequently suffer from significant calibration errors. Basic ensemble techniques like majority voting provide modest improvements, while metamodel-based ensembles consistently reduce ECE and MCE across all architectures. Notably, the largest of our compared metamodels demonstrate the most substantial calibration improvements, with minimal impact on accuracy. Moreover, classifier ensembles with metamodels outperform traditional model ensembles in calibration performance, while requiring significantly fewer parameters. In comparison to traditional post-hoc calibration methods, our approach removes the need for a separate calibration dataset. These findings underscore the potential of our proposed metamodel-based classifier ensembles as an efficient and effective approach to improving model calibration, thereby contributing to more reliable deep learning systems.

* This paper has been accepted at International Conference on Computer Vision Theory and Applications (VISAPP), 2025

Via

Access Paper or Ask Questions

Boosting Few-Shot Detection with Large Language Models and Layout-to-Image Synthesis

Oct 09, 2024

Ahmed Abdullah, Nikolas Ebert, Oliver Wasenmüller

Figure 1 for Boosting Few-Shot Detection with Large Language Models and Layout-to-Image Synthesis

Figure 2 for Boosting Few-Shot Detection with Large Language Models and Layout-to-Image Synthesis

Figure 3 for Boosting Few-Shot Detection with Large Language Models and Layout-to-Image Synthesis

Figure 4 for Boosting Few-Shot Detection with Large Language Models and Layout-to-Image Synthesis

Abstract:Recent advancements in diffusion models have enabled a wide range of works exploiting their ability to generate high-volume, high-quality data for use in various downstream tasks. One subclass of such models, dubbed Layout-to-Image Synthesis (LIS), learns to generate images conditioned on a spatial layout (bounding boxes, masks, poses, etc.) and has shown a promising ability to generate realistic images, albeit with limited layout-adherence. Moreover, the question of how to effectively transfer those models for scalable augmentation of few-shot detection data remains unanswered. Thus, we propose a collaborative framework employing a Large Language Model (LLM) and an LIS model for enhancing few-shot detection beyond state-of-the-art generative augmentation approaches. We leverage LLM's reasoning ability to extrapolate the spatial prior of the annotation space by generating new bounding boxes given only a few example annotations. Additionally, we introduce our novel layout-aware CLIP score for sample ranking, enabling tight coupling between generated layouts and images. Significant improvements on COCO few-shot benchmarks are observed. With our approach, a YOLOX-S baseline is boosted by more than 140%, 50%, 35% in mAP on the COCO 5-,10-, and 30-shot settings, respectively.

* This paper has been accepted at the Asian Conference on Computer Vision (ACCV), 2024

Via

Access Paper or Ask Questions

GenFormer -- Generated Images are All You Need to Improve Robustness of Transformers on Small Datasets

Aug 27, 2024

Sven Oehri, Nikolas Ebert, Ahmed Abdullah, Didier Stricker, Oliver Wasenmüller

Figure 1 for GenFormer -- Generated Images are All You Need to Improve Robustness of Transformers on Small Datasets

Figure 2 for GenFormer -- Generated Images are All You Need to Improve Robustness of Transformers on Small Datasets

Figure 3 for GenFormer -- Generated Images are All You Need to Improve Robustness of Transformers on Small Datasets

Figure 4 for GenFormer -- Generated Images are All You Need to Improve Robustness of Transformers on Small Datasets

Abstract:Recent studies showcase the competitive accuracy of Vision Transformers (ViTs) in relation to Convolutional Neural Networks (CNNs), along with their remarkable robustness. However, ViTs demand a large amount of data to achieve adequate performance, which makes their application to small datasets challenging, falling behind CNNs. To overcome this, we propose GenFormer, a data augmentation strategy utilizing generated images, thereby improving transformer accuracy and robustness on small-scale image classification tasks. In our comprehensive evaluation we propose Tiny ImageNetV2, -R, and -A as new test set variants of Tiny ImageNet by transferring established ImageNet generalization and robustness benchmarks to the small-scale data domain. Similarly, we introduce MedMNIST-C and EuroSAT-C as corrupted test set variants of established fine-grained datasets in the medical and aerial domain. Through a series of experiments conducted on small datasets of various domains, including Tiny ImageNet, CIFAR, EuroSAT and MedMNIST datasets, we demonstrate the synergistic power of our method, in particular when combined with common train and test time augmentations, knowledge distillation, and architectural design choices. Additionally, we prove the effectiveness of our approach under challenging conditions with limited training data, demonstrating significant improvements in both accuracy and robustness, bridging the gap between CNNs and ViTs in the small-scale dataset domain.

* This paper has been accepted at International Conference on Pattern Recognition (ICPR), 2024

Via

Access Paper or Ask Questions

360$^\circ$ from a Single Camera: A Few-Shot Approach for LiDAR Segmentation

Sep 12, 2023

Laurenz Reichardt, Nikolas Ebert, Oliver Wasenmüller

$Figure 1 for 360$^\circ$ from a Single Camera: A Few-Shot Approach for LiDAR Segmentation$

$Figure 2 for 360$^\circ$ from a Single Camera: A Few-Shot Approach for LiDAR Segmentation$

$Figure 3 for 360$^\circ$ from a Single Camera: A Few-Shot Approach for LiDAR Segmentation$

$Figure 4 for 360$^\circ$ from a Single Camera: A Few-Shot Approach for LiDAR Segmentation$

Abstract:Deep learning applications on LiDAR data suffer from a strong domain gap when applied to different sensors or tasks. In order for these methods to obtain similar accuracy on different data in comparison to values reported on public benchmarks, a large scale annotated dataset is necessary. However, in practical applications labeled data is costly and time consuming to obtain. Such factors have triggered various research in label-efficient methods, but a large gap remains to their fully-supervised counterparts. Thus, we propose ImageTo360, an effective and streamlined few-shot approach to label-efficient LiDAR segmentation. Our method utilizes an image teacher network to generate semantic predictions for LiDAR data within a single camera view. The teacher is used to pretrain the LiDAR segmentation student network, prior to optional fine-tuning on 360$^\circ$ data. Our method is implemented in a modular manner on the point level and as such is generalizable to different architectures. We improve over the current state-of-the-art results for label-efficient methods and even surpass some traditional fully-supervised segmentation networks.

* ICCV Workshop 2023

Via

Access Paper or Ask Questions

Transformer-based Detection of Microorganisms on High-Resolution Petri Dish Images

Aug 21, 2023

Nikolas Ebert, Didier Stricker, Oliver Wasenmüller

Figure 1 for Transformer-based Detection of Microorganisms on High-Resolution Petri Dish Images

Figure 2 for Transformer-based Detection of Microorganisms on High-Resolution Petri Dish Images

Figure 3 for Transformer-based Detection of Microorganisms on High-Resolution Petri Dish Images

Figure 4 for Transformer-based Detection of Microorganisms on High-Resolution Petri Dish Images

Abstract:Many medical or pharmaceutical processes have strict guidelines regarding continuous hygiene monitoring. This often involves the labor-intensive task of manually counting microorganisms in Petri dishes by trained personnel. Automation attempts often struggle due to major challenges: significant scaling differences, low separation, low contrast, etc. To address these challenges, we introduce AttnPAFPN, a high-resolution detection pipeline that leverages a novel transformer variation, the efficient-global self-attention mechanism. Our streamlined approach can be easily integrated in almost any multi-scale object detection pipeline. In a comprehensive evaluation on the publicly available AGAR dataset, we demonstrate the superior accuracy of our network over the current state-of-the-art. In order to demonstrate the task-independent performance of our approach, we perform further experiments on COCO and LIVECell datasets.

* This paper has been accepted at IEEE International Conference on Computer Vision Workshops (ICCV workshop), 2023

Via

Access Paper or Ask Questions

Light-Weight Vision Transformer with Parallel Local and Global Self-Attention

Jul 18, 2023

Nikolas Ebert, Laurenz Reichardt, Didier Stricker, Oliver Wasenmüller

Figure 1 for Light-Weight Vision Transformer with Parallel Local and Global Self-Attention

Figure 2 for Light-Weight Vision Transformer with Parallel Local and Global Self-Attention

Figure 3 for Light-Weight Vision Transformer with Parallel Local and Global Self-Attention

Figure 4 for Light-Weight Vision Transformer with Parallel Local and Global Self-Attention

Abstract:While transformer architectures have dominated computer vision in recent years, these models cannot easily be deployed on hardware with limited resources for autonomous driving tasks that require real-time-performance. Their computational complexity and memory requirements limits their use, especially for applications with high-resolution inputs. In our work, we redesign the powerful state-of-the-art Vision Transformer PLG-ViT to a much more compact and efficient architecture that is suitable for such tasks. We identify computationally expensive blocks in the original PLG-ViT architecture and propose several redesigns aimed at reducing the number of parameters and floating-point operations. As a result of our redesign, we are able to reduce PLG-ViT in size by a factor of 5, with a moderate drop in performance. We propose two variants, optimized for the best trade-off between parameter count to runtime as well as parameter count to accuracy. With only 5 million parameters, we achieve 79.5$\%$ top-1 accuracy on the ImageNet-1K classification benchmark. Our networks demonstrate great performance on general vision benchmarks like COCO instance segmentation. In addition, we conduct a series of experiments, demonstrating the potential of our approach in solving various tasks specifically tailored to the challenges of autonomous driving and transportation.

* This paper has been accepted at IEEE Intelligent Transportation Systems Conference (ITSC), 2023

Via

Access Paper or Ask Questions