4D mmWave radar sensors are well suited for city scale Intelligent Transportation Systems (ITS) given their long sensing range, weatherproof functionality, simple mechanical design, and low manufacturing cost. In this paper, we investigate radar-based ITS for scalable traffic analysis. Localization of these radar sensors in a city scale range is a fundamental task in ITS. For mobile ITS setups it requires more endeavor. To address this task, we propose a self-localization approach that matches two descriptions of "road": the one from the geometry of the motion trajectories of cumulatively observed vehicles, and the other one from the aerial laser scan. An ICP (iterative closest point) algorithm is used to register the motion trajectory into the road section of the laser scan to estimate the sensor pose. We evaluates the results and show that it outperforms other map-based radar localization methods, especially for the orientation estimation. Beyond the localization result, we project radar sensor data onto city scale laser scan and generate an scalable occupancy heat map as a traffic analysis tool. This is demonstrated using two radar sensors monitoring an urban area in the real world.
Anticipating future actions is inherently uncertain. Given an observed video segment containing ongoing actions, multiple subsequent actions can plausibly follow. This uncertainty becomes even larger when predicting far into the future. However, the majority of existing action anticipation models adhere to a deterministic approach, neglecting to account for future uncertainties. In this work, we rethink action anticipation from a generative view, employing diffusion models to capture different possible future actions. In this framework, future actions are iteratively generated from standard Gaussian noise in the latent space, conditioned on the observed video, and subsequently transitioned into the action space. Extensive experiments on four benchmark datasets, i.e., Breakfast, 50Salads, EpicKitchens, and EGTEA Gaze+, are performed and the proposed method achieves superior or comparable results to state-of-the-art methods, showing the effectiveness of a generative approach for action anticipation. Our code and trained models will be published on GitHub.
To ensure the security of airports, it is essential to protect the airside from unauthorized access. For this purpose, security fences are commonly used, but they require regular inspection to detect damages. However, due to the growing shortage of human specialists and the large manual effort, there is the need for automated methods. The aim is to automatically inspect the fence for damage with the help of an autonomous robot. In this work, we explore object detection methods to address the fence inspection task and localize various types of damages. In addition to evaluating four State-of-the-Art (SOTA) object detection models, we analyze the impact of several design criteria, aiming at adapting to the task-specific challenges. This includes contrast adjustment, optimization of hyperparameters, and utilization of modern backbones. The experimental results indicate that our optimized You Only Look Once v5 (YOLOv5) model achieves the highest accuracy of the four methods with an increase of 6.9% points in Average Precision (AP) compared to the baseline. Moreover, we show the real-time capability of the model. The trained models are published on GitHub: https://github.com/N-Friederich/airport_fence_inspection.
The ability to anticipate possible future human actions is essential for a wide range of applications, including autonomous driving and human-robot interaction. Consequently, numerous methods have been introduced for action anticipation in recent years, with deep learning-based approaches being particularly popular. In this work, we review the recent advances of action anticipation algorithms with a particular focus on daily-living scenarios. Additionally, we classify these methods according to their primary contributions and summarize them in tabular form, allowing readers to grasp the details at a glance. Furthermore, we delve into the common evaluation metrics and datasets used for action anticipation and provide future directions with systematical discussions.
Since 2010, multiple cyber incidents on industrial infrastructure, such as Stuxnet and CrashOverride, have exposed the vulnerability of Industrial Control Systems (ICS) to cyber threats. The industrial systems are commissioned for longer duration amounting to decades, often resulting in non-compliance to technological advancements in industrial cybersecurity mechanisms. The unavailability of network infrastructure information makes designing the security policies or configuring the cybersecurity countermeasures such as Network Intrusion Detection Systems (NIDS) challenging. An empirical solution is to self-learn the network infrastructure information of an industrial system from its monitored network traffic to make the network transparent for downstream analyses tasks such as anomaly detection. In this work, a Python-based industrial communication paradigm-aware framework, named PROFINET Operations Enumeration and Tracking (POET), that enumerates different industrial operations executed in a deterministic order of a PROFINET-based industrial system is reported. The operation-driving industrial network protocol frames are dissected for enumeration of the operations. For the requirements of capturing the transitions between industrial operations triggered by the communication events, the Finite State Machines (FSM) are modelled to enumerate the PROFINET operations of the device, connection and system. POET extracts the network information from network traffic to instantiate appropriate FSM models (Device, Connection or System) and track the industrial operations. It successfully detects and reports the anomalies triggered by a network attack in a miniaturized PROFINET-based industrial system, executed through valid network protocol exchanges and resulting in invalid PROFINET operation transition for the device.
Modeling a 3D volumetric shape as an assembly of decomposed shape parts is much more challenging, but semantically more valuable than direct reconstruction from a full shape representation. The neural network needs to implicitly learn part relations coherently, which is typically performed by dedicated network layers that can generate transformation matrices for each part. In this paper, we propose a VoxAttention network architecture for attention-based part assembly. We further propose a variant of using channel-wise part attention and show the advantages of this approach. Experimental results show that our method outperforms most state-of-the-art methods for the part relation-aware 3D shape modeling task.
Deep neural networks for scene perception in automated vehicles achieve excellent results for the domains they were trained on. However, in real-world conditions, the domain of operation and its underlying data distribution are subject to change. Adverse weather conditions, in particular, can significantly decrease model performance when such data are not available during training.Additionally, when a model is incrementally adapted to a new domain, it suffers from catastrophic forgetting, causing a significant drop in performance on previously observed domains. Despite recent progress in reducing catastrophic forgetting, its causes and effects remain obscure. Therefore, we study how the representations of semantic segmentation models are affected during domain-incremental learning in adverse weather conditions. Our experiments and representational analyses indicate that catastrophic forgetting is primarily caused by changes to low-level features in domain-incremental learning and that learning more general features on the source domain using pre-training and image augmentations leads to efficient feature reuse in subsequent tasks, which drastically reduces catastrophic forgetting. These findings highlight the importance of methods that facilitate generalized features for effective continual learning algorithms.
Point cloud sampling is a less explored research topic for this data representation. The most common sampling methods nowadays are still classical random sampling and farthest point sampling. With the development of neural networks, various methods have been proposed to sample point clouds in a task-based learning manner. However, these methods are mostly generative-based, other than selecting points directly with mathematical statistics. Inspired by the Canny edge detection algorithm for images and with the help of the attention mechanism, this paper proposes a non-generative Attention-based Point cloud Edge Sampling method (APES), which can capture the outline of input point clouds. Experimental results show that better performances are achieved with our sampling method due to the important outline information it learned.
Research in the field of Continual Semantic Segmentation is mainly investigating novel learning algorithms to overcome catastrophic forgetting of neural networks. Most recent publications have focused on improving learning algorithms without distinguishing effects caused by the choice of neural architecture.Therefore, we study how the choice of neural network architecture affects catastrophic forgetting in class- and domain-incremental semantic segmentation. Specifically, we compare the well-researched CNNs to recently proposed Transformers and Hybrid architectures, as well as the impact of the choice of novel normalization layers and different decoder heads. We find that traditional CNNs like ResNet have high plasticity but low stability, while transformer architectures are much more stable. When the inductive biases of CNN architectures are combined with transformers in hybrid architectures, it leads to higher plasticity and stability. The stability of these models can be explained by their ability to learn general features that are robust against distribution shifts. Experiments with different normalization layers show that Continual Normalization achieves the best trade-off in terms of adaptability and stability of the model. In the class-incremental setting, the choice of the normalization layer has much less impact. Our experiments suggest that the right choice of architecture can significantly reduce forgetting even with naive fine-tuning and confirm that for real-world applications, the architecture is an important factor in designing a continual learning model.
On robotics computer vision tasks, generating and annotating large amounts of data from real-world for the use of deep learning-based approaches is often difficult or even impossible. A common strategy for solving this problem is to apply simulation-to-reality (sim2real) approaches with the help of simulated scenes. While the majority of current robotics vision sim2real work focuses on image data, we present an industrial application case that uses sim2real transfer learning for point cloud data. We provide insights on how to generate and process synthetic point cloud data in order to achieve better performance when the learned model is transferred to real-world data. The issue of imbalanced learning is investigated using multiple strategies. A novel patch-based attention network is proposed additionally to tackle this problem.