Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rainer Stiefelhagen

Navigating Open Set Scenarios for Skeleton-based Action Recognition

Dec 11, 2023

Kunyu Peng, Cheng Yin, Junwei Zheng, Ruiping Liu, David Schneider, Jiaming Zhang, Kailun Yang, M. Saquib Sarfraz, Rainer Stiefelhagen, Alina Roitberg

Figure 1 for Navigating Open Set Scenarios for Skeleton-based Action Recognition

Figure 2 for Navigating Open Set Scenarios for Skeleton-based Action Recognition

Figure 3 for Navigating Open Set Scenarios for Skeleton-based Action Recognition

Figure 4 for Navigating Open Set Scenarios for Skeleton-based Action Recognition

Abstract:In real-world scenarios, human actions often fall outside the distribution of training data, making it crucial for models to recognize known actions and reject unknown ones. However, using pure skeleton data in such open-set conditions poses challenges due to the lack of visual background cues and the distinct sparse structure of body pose sequences. In this paper, we tackle the unexplored Open-Set Skeleton-based Action Recognition (OS-SAR) task and formalize the benchmark on three skeleton-based datasets. We assess the performance of seven established open-set approaches on our task and identify their limits and critical generalization issues when dealing with skeleton information. To address these challenges, we propose a distance-based cross-modality ensemble method that leverages the cross-modal alignment of skeleton joints, bones, and velocities to achieve superior open-set recognition performance. We refer to the key idea as CrossMax - an approach that utilizes a novel cross-modality mean max discrepancy suppression mechanism to align latent spaces during training and a cross-modality distance-based logits refinement method during testing. CrossMax outperforms existing approaches and consistently yields state-of-the-art results across all datasets and backbones. The benchmark, code, and models will be released at https://github.com/KPeng9510/OS-SAR.

* Accepted to AAAI 2024. The benchmark, code, and models will be released at https://github.com/KPeng9510/OS-SAR

Via

Access Paper or Ask Questions

Sliding Window FastEdit: A Framework for Lesion Annotation in Whole-body PET Images

Nov 24, 2023

Matthias Hadlich, Zdravko Marinov, Moon Kim, Enrico Nasca, Jens Kleesiek, Rainer Stiefelhagen

Figure 1 for Sliding Window FastEdit: A Framework for Lesion Annotation in Whole-body PET Images

Figure 2 for Sliding Window FastEdit: A Framework for Lesion Annotation in Whole-body PET Images

Figure 3 for Sliding Window FastEdit: A Framework for Lesion Annotation in Whole-body PET Images

Figure 4 for Sliding Window FastEdit: A Framework for Lesion Annotation in Whole-body PET Images

Abstract:Deep learning has revolutionized the accurate segmentation of diseases in medical imaging. However, achieving such results requires training with numerous manual voxel annotations. This requirement presents a challenge for whole-body Positron Emission Tomography (PET) imaging, where lesions are scattered throughout the body. To tackle this problem, we introduce SW-FastEdit - an interactive segmentation framework that accelerates the labeling by utilizing only a few user clicks instead of voxelwise annotations. While prior interactive models crop or resize PET volumes due to memory constraints, we use the complete volume with our sliding window-based interactive scheme. Our model outperforms existing non-sliding window interactive models on the AutoPET dataset and generalizes to the previously unseen HECKTOR dataset. A user study revealed that annotators achieve high-quality predictions with only 10 click iterations and a low perceived NASA-TLX workload. Our framework is implemented using MONAI Label and is available: https://github.com/matt3o/AutoPET2-Submission/

* 5 pages, 2 figures, 4 tables

Via

Access Paper or Ask Questions

Deep Interactive Segmentation of Medical Images: A Systematic Review and Taxonomy

Nov 23, 2023

Zdravko Marinov, Paul F. Jäger, Jan Egger, Jens Kleesiek, Rainer Stiefelhagen

Abstract:Interactive segmentation is a crucial research area in medical image analysis aiming to boost the efficiency of costly annotations by incorporating human feedback. This feedback takes the form of clicks, scribbles, or masks and allows for iterative refinement of the model output so as to efficiently guide the system towards the desired behavior. In recent years, deep learning-based approaches have propelled results to a new level causing a rapid growth in the field with 121 methods proposed in the medical imaging domain alone. In this review, we provide a structured overview of this emerging field featuring a comprehensive taxonomy, a systematic review of existing methods, and an in-depth analysis of current practices. Based on these contributions, we discuss the challenges and opportunities in the field. For instance, we find that there is a severe lack of comparison across methods which needs to be tackled by standardized baselines and benchmarks.

* 26 pages, 8 figures, 10 tables; Zdravko Marinov and Paul F. J\"ager and co-first authors; This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

Quantized Distillation: Optimizing Driver Activity Recognition Models for Resource-Constrained Environments

Nov 10, 2023

Calvin Tanama, Kunyu Peng, Zdravko Marinov, Rainer Stiefelhagen, Alina Roitberg

Figure 1 for Quantized Distillation: Optimizing Driver Activity Recognition Models for Resource-Constrained Environments

Figure 2 for Quantized Distillation: Optimizing Driver Activity Recognition Models for Resource-Constrained Environments

Figure 3 for Quantized Distillation: Optimizing Driver Activity Recognition Models for Resource-Constrained Environments

Figure 4 for Quantized Distillation: Optimizing Driver Activity Recognition Models for Resource-Constrained Environments

Abstract:Deep learning-based models are at the forefront of most driver observation benchmarks due to their remarkable accuracies but are also associated with high computational costs. This is challenging, as resources are often limited in real-world driving scenarios. This paper introduces a lightweight framework for resource-efficient driver activity recognition. The framework enhances 3D MobileNet, a neural architecture optimized for speed in video classification, by incorporating knowledge distillation and model quantization to balance model accuracy and computational efficiency. Knowledge distillation helps maintain accuracy while reducing the model size by leveraging soft labels from a larger teacher model (I3D), instead of relying solely on original ground truth data. Model quantization significantly lowers memory and computation demands by using lower precision integers for model weights and activations. Extensive testing on a public dataset for in-vehicle monitoring during autonomous driving demonstrates that this new framework achieves a threefold reduction in model size and a 1.4-fold improvement in inference time, compared to an already optimized architecture. The code for this study is available at https://github.com/calvintanama/qd-driver-activity-reco.

* Accepted at IROS 2023

Via

Access Paper or Ask Questions

CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity

Oct 18, 2023

Hao Shi, Chengshan Pang, Jiaming Zhang, Kailun Yang, Yuhao Wu, Huajian Ni, Yining Lin, Rainer Stiefelhagen, Kaiwei Wang

Abstract:Roadside camera-driven 3D object detection is a crucial task in intelligent transportation systems, which extends the perception range beyond the limitations of vision-centric vehicles and enhances road safety. While previous studies have limitations in using only depth or height information, we find both depth and height matter and they are in fact complementary. The depth feature encompasses precise geometric cues, whereas the height feature is primarily focused on distinguishing between various categories of height intervals, essentially providing semantic context. This insight motivates the development of Complementary-BEV (CoBEV), a novel end-to-end monocular 3D object detection framework that integrates depth and height to construct robust BEV representations. In essence, CoBEV estimates each pixel's depth and height distribution and lifts the camera features into 3D space for lateral fusion using the newly proposed two-stage complementary feature selection (CFS) module. A BEV feature distillation framework is also seamlessly integrated to further enhance the detection accuracy from the prior knowledge of the fusion-modal CoBEV teacher. We conduct extensive experiments on the public 3D detection benchmarks of roadside camera-based DAIR-V2X-I and Rope3D, as well as the private Supremind-Road dataset, demonstrating that CoBEV not only achieves the accuracy of the new state-of-the-art, but also significantly advances the robustness of previous methods in challenging long-distance scenarios and noisy camera disturbance, and enhances generalization by a large margin in heterologous settings with drastic changes in scene and camera parameters. For the first time, the vehicle AP score of a camera model reaches 80% on DAIR-V2X-I in terms of easy mode. The source code will be made publicly available at https://github.com/MasterHow/CoBEV.

* The source code will be made publicly available at https://github.com/MasterHow/CoBEV

Via

Access Paper or Ask Questions

Elevating Skeleton-Based Action Recognition with Efficient Multi-Modality Self-Supervision

Sep 21, 2023

Yiping Wei, Kunyu Peng, Alina Roitberg, Jiaming Zhang, Junwei Zheng, Ruiping Liu, Yufan Chen, Kailun Yang, Rainer Stiefelhagen

Figure 1 for Elevating Skeleton-Based Action Recognition with Efficient Multi-Modality Self-Supervision

Figure 2 for Elevating Skeleton-Based Action Recognition with Efficient Multi-Modality Self-Supervision

Abstract:Self-supervised representation learning for human action recognition has developed rapidly in recent years. Most of the existing works are based on skeleton data while using a multi-modality setup. These works overlooked the differences in performance among modalities, which led to the propagation of erroneous knowledge between modalities while only three fundamental modalities, i.e., joints, bones, and motions are used, hence no additional modalities are explored. In this work, we first propose an Implicit Knowledge Exchange Module (IKEM) which alleviates the propagation of erroneous knowledge between low-performance modalities. Then, we further propose three new modalities to enrich the complementary information between modalities. Finally, to maintain efficiency when introducing new modalities, we propose a novel teacher-student framework to distill the knowledge from the secondary modalities into the mandatory modalities considering the relationship constrained by anchors, positives, and negatives, named relational cross-modality knowledge distillation. The experimental results demonstrate the effectiveness of our approach, unlocking the efficient use of skeleton-based multi-modality data. Source code will be made publicly available at https://github.com/desehuileng0o0/IKEM.

* The source code will be made publicly available at https://github.com/desehuileng0o0/IKEM

Via

Access Paper or Ask Questions

Unveiling the Hidden Realm: Self-supervised Skeleton-based Action Recognition in Occluded Environments

Sep 21, 2023

Yifei Chen, Kunyu Peng, Alina Roitberg, David Schneider, Jiaming Zhang, Junwei Zheng, Ruiping Liu, Yufan Chen, Kailun Yang, Rainer Stiefelhagen

Figure 1 for Unveiling the Hidden Realm: Self-supervised Skeleton-based Action Recognition in Occluded Environments

Figure 2 for Unveiling the Hidden Realm: Self-supervised Skeleton-based Action Recognition in Occluded Environments

Figure 3 for Unveiling the Hidden Realm: Self-supervised Skeleton-based Action Recognition in Occluded Environments

Figure 4 for Unveiling the Hidden Realm: Self-supervised Skeleton-based Action Recognition in Occluded Environments

Abstract:To integrate action recognition methods into autonomous robotic systems, it is crucial to consider adverse situations involving target occlusions. Such a scenario, despite its practical relevance, is rarely addressed in existing self-supervised skeleton-based action recognition methods. To empower robots with the capacity to address occlusion, we propose a simple and effective method. We first pre-train using occluded skeleton sequences, then use k-means clustering (KMeans) on sequence embeddings to group semantically similar samples. Next, we employ K-nearest-neighbor (KNN) to fill in missing skeleton data based on the closest sample neighbors. Imputing incomplete skeleton sequences to create relatively complete sequences as input provides significant benefits to existing skeleton-based self-supervised models. Meanwhile, building on the state-of-the-art Partial Spatio-Temporal Learning (PSTL), we introduce an Occluded Partial Spatio-Temporal Learning (OPSTL) framework. This enhancement utilizes Adaptive Spatial Masking (ASM) for better use of high-quality, intact skeletons. The effectiveness of our imputation methods is verified on the challenging occluded versions of the NTURGB+D 60 and NTURGB+D 120. The source code will be made publicly available at https://github.com/cyfml/OPSTL.

* The source code will be made publicly available at https://github.com/cyfml/OPSTL

Via

Access Paper or Ask Questions

AutoPET Challenge 2023: Sliding Window-based Optimization of U-Net

Sep 21, 2023

Matthias Hadlich, Zdravko Marinov, Rainer Stiefelhagen

Figure 1 for AutoPET Challenge 2023: Sliding Window-based Optimization of U-Net

Figure 2 for AutoPET Challenge 2023: Sliding Window-based Optimization of U-Net

Figure 3 for AutoPET Challenge 2023: Sliding Window-based Optimization of U-Net

Figure 4 for AutoPET Challenge 2023: Sliding Window-based Optimization of U-Net

Abstract:Tumor segmentation in medical imaging is crucial and relies on precise delineation. Fluorodeoxyglucose Positron-Emission Tomography (FDG-PET) is widely used in clinical practice to detect metabolically active tumors. However, FDG-PET scans may misinterpret irregular glucose consumption in healthy or benign tissues as cancer. Combining PET with Computed Tomography (CT) can enhance tumor segmentation by integrating metabolic and anatomic information. FDG-PET/CT scans are pivotal for cancer staging and reassessment, utilizing radiolabeled fluorodeoxyglucose to highlight metabolically active regions. Accurately distinguishing tumor-specific uptake from physiological uptake in normal tissues is a challenging aspect of precise tumor segmentation. The AutoPET challenge addresses this by providing a dataset of 1014 FDG-PET/CT studies, encouraging advancements in accurate tumor segmentation and analysis within the FDG-PET/CT domain. Code: https://github.com/matt3o/AutoPET2-Submission/

* 9 pages, 1 figure, MICCAI 2023 - AutoPET Challenge Submission

Via

Access Paper or Ask Questions

Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation

Aug 23, 2023

Hejun Xiao, Kunyu Peng, Xiangsheng Huang, Alina Roitberg1, Hao Li, Zhaohui Wang, Rainer Stiefelhagen

Figure 1 for Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation

Figure 2 for Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation

Figure 3 for Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation

Figure 4 for Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation

Abstract:Fall detection is a vital task in health monitoring, as it allows the system to trigger an alert and therefore enabling faster interventions when a person experiences a fall. Although most previous approaches rely on standard RGB video data, such detailed appearance-aware monitoring poses significant privacy concerns. Depth sensors, on the other hand, are better at preserving privacy as they merely capture the distance of objects from the sensor or camera, omitting color and texture information. In this paper, we introduce a privacy-supporting solution that makes the RGB-trained model applicable in depth domain and utilizes depth data at test time for fall detection. To achieve cross-modal fall detection, we present an unsupervised RGB to Depth (RGB2Depth) cross-modal domain adaptation approach that leverages labelled RGB data and unlabelled depth data during training. Our proposed pipeline incorporates an intermediate domain module for feature bridging, modality adversarial loss for modality discrimination, classification loss for pseudo-labeled depth data and labeled source data, triplet loss that considers both source and target domains, and a novel adaptive loss weight adjustment method for improved coordination among various losses. Our approach achieves state-of-the-art results in the unsupervised RGB2Depth domain adaptation task for fall detection. Code is available at https://github.com/1015206533/privacy_supporting_fall_detection.

Via

Access Paper or Ask Questions

On Transferability of Driver Observation Models from Simulated to Real Environments in Autonomous Cars

Jul 31, 2023

Walter Morales-Alvarez, Novel Certad, Alina Roitberg, Rainer Stiefelhagen, Cristina Olaverri-Monreal

Figure 1 for On Transferability of Driver Observation Models from Simulated to Real Environments in Autonomous Cars

Figure 2 for On Transferability of Driver Observation Models from Simulated to Real Environments in Autonomous Cars

Figure 3 for On Transferability of Driver Observation Models from Simulated to Real Environments in Autonomous Cars

Figure 4 for On Transferability of Driver Observation Models from Simulated to Real Environments in Autonomous Cars

Abstract:For driver observation frameworks, clean datasets collected in controlled simulated environments often serve as the initial training ground. Yet, when deployed under real driving conditions, such simulator-trained models quickly face the problem of distributional shifts brought about by changing illumination, car model, variations in subject appearances, sensor discrepancies, and other environmental alterations. This paper investigates the viability of transferring video-based driver observation models from simulation to real-world scenarios in autonomous vehicles, given the frequent use of simulation data in this domain due to safety issues. To achieve this, we record a dataset featuring actual autonomous driving conditions and involving seven participants engaged in highly distracting secondary activities. To enable direct SIM to REAL transfer, our dataset was designed in accordance with an existing large-scale simulator dataset used as the training source. We utilize the Inflated 3D ConvNet (I3D) model, a popular choice for driver observation, with Gradient-weighted Class Activation Mapping (Grad-CAM) for detailed analysis of model decision-making. Though the simulator-based model clearly surpasses the random baseline, its recognition quality diminishes, with average accuracy dropping from 85.7% to 46.6%. We also observe strong variations across different behavior classes. This underscores the challenges of model transferability, facilitating our research of more robust driver observation systems capable of dealing with real driving conditions.

Via

Access Paper or Ask Questions