Illicit object detection is a critical task performed at various high-security locations, including airports, train stations, subways, and ports. The continuous and tedious work of examining thousands of X-ray images per hour can be mentally taxing. Thus, Deep Neural Networks (DNNs) can be used to automate the X-ray image analysis process, improve efficiency and alleviate the security officers' inspection burden. The neural architectures typically utilized in relevant literature are Convolutional Neural Networks (CNNs), with Vision Transformers (ViTs) rarely employed. In order to address this gap, this paper conducts a comprehensive evaluation of relevant ViT architectures on illicit item detection in X-ray images. This study utilizes both Transformer and hybrid backbones, such as SWIN and NextViT, and detectors, such as DINO and RT-DETR. The results demonstrate the remarkable accuracy of the DINO Transformer detector in the low-data regime, the impressive real-time performance of YOLOv8, and the effectiveness of the hybrid NextViT backbone.
This paper presents a formation control approach for contactless gesture-based Human-Swarm Interaction (HSI) between a team of multi-rotor Unmanned Aerial Vehicles (UAVs) and a human worker. The approach is intended for monitoring the safety of human workers, especially those working at heights. In the proposed dynamic formation scheme, one UAV acts as the leader of the formation and is equipped with sensors for human worker detection and gesture recognition. The follower UAVs maintain a predetermined formation relative to the worker's position, thereby providing additional perspectives of the monitored scene. Hand gestures allow the human worker to specify movements and action commands for the UAV team and initiate other mission-related commands without the need for an additional communication channel or specific markers. Together with a novel unified human detection and tracking algorithm, human pose estimation approach and gesture detection pipeline, the proposed approach forms a first instance of an HSI system incorporating all these modules onboard real-world UAVs. Simulations and field experiments with three UAVs and a human worker in a mock-up scenario showcase the effectiveness and responsiveness of the proposed approach.
Automated visual firearms classification from RGB images is an important real-world task with applications in public space security, intelligence gathering and law enforcement investigations. When applied to images massively crawled from the World Wide Web (including social media and dark Web sites), it can serve as an important component of systems that attempt to identify criminal firearms trafficking networks, by analyzing Big Data from open-source intelligence. Deep Neural Networks (DNN) are the state-of-the-art methodology for achieving this, with Convolutional Neural Networks (CNN) being typically employed. The common transfer learning approach consists of pretraining on a large-scale, generic annotated dataset for whole-image classification, such as ImageNet-1k, and then finetuning the DNN on a smaller, annotated, task-specific, downstream dataset for visual firearms classification. Neither Visual Transformer (ViT) neural architectures nor Self-Supervised Learning (SSL) approaches have been so far evaluated on this critical task. SSL essentially consists of replacing the traditional supervised pretraining objective with an unsupervised pretext task that does not require ground-truth labels..
Automated detection of contraband items in X-ray images can significantly increase public safety, by enhancing the productivity and alleviating the mental load of security officers in airports, subways, customs/post offices, etc. The large volume and high throughput of passengers, mailed parcels, etc., during rush hours practically make it a Big Data problem. Modern computer vision algorithms relying on Deep Neural Networks (DNNs) have proven capable of undertaking this task even under resource-constrained and embedded execution scenarios, e.g., as is the case with fast, single-stage object detectors. However, no comparative experimental assessment of the various relevant DNN components/methods has been performed under a common evaluation protocol, which means that reliable cross-method comparisons are missing. This paper presents exactly such a comparative assessment, utilizing a public relevant dataset and a well-defined methodology for selecting the specific DNN components/modules that are being evaluated. The results indicate the superiority of Transformer detectors, the obsolete nature of auxiliary neural modules that have been developed in the past few years for security applications and the efficiency of the CSP-DarkNet backbone CNN.
The adoption of Deep Neural Networks (DNNs) has greatly benefited Natural Language Processing (NLP) during the past decade. However, the demands of long documents analysis are quite different from those of shorter texts, with the ever increasing size of documents uploaded online rendering NLP on long documents a critical area of research. This paper surveys the current state-of-the-art in the domain, overviewing the relevant neural building blocks and subsequently focusing on two main NLP tasks: Document Classification, Summarization as well as mentioning uses in Sentiment Analysis. We detail the challenges, issues and current solutions related to long-document NLP. We also list publicly available, labelled, long-document datasets used in current research.
Automated detection of contraband items in X-ray images can significantly increase public safety, by enhancing the productivity and alleviating the mental load of security officers in airports, subways, customs/post offices, etc. The large volume and high throughput of passengers, mailed parcels, etc., during rush hours make it a Big Data analysis task. Modern computer vision algorithms relying on Deep Neural Networks (DNNs) have proven capable of undertaking this task even under resource-constrained and embedded execution scenarios, e.g., as is the case with fast, single-stage, anchor-based object detectors. This paper proposes a two-fold improvement of such algorithms for the X-ray analysis domain, introducing two complementary novelties. Firstly, more efficient anchors are obtained by hierarchical clustering the sizes of the ground-truth training set bounding boxes; thus, the resulting anchors follow a natural hierarchy aligned with the semantic structure of the data. Secondly, the default Non-Maximum Suppression (NMS) algorithm at the end of the object detection pipeline is modified to better handle occluded object detection and to reduce the number of false predictions, by inserting the Efficient Intersection over Union (E-IoU) metric into the Weighted Cluster NMS method. E-IoU provides more discriminative geometrical correlations between the candidate bounding boxes/Regions-of-Interest (RoIs). The proposed method is implemented on a common single-stage object detector (YOLOv5) and its experimental evaluation on a relevant public dataset indicates significant accuracy gains over both the baseline and competing approaches. This highlights the potential of Big Data analysis in enhancing public safety.