Abstract:Gaze target detection (GTD) is the task of predicting where a person in an image is looking. This is a challenging task, as it requires the ability to understand the relationship between the person's head, body, and eyes, as well as the surrounding environment. In this paper, we propose a novel method for GTD that fuses multiple pieces of information extracted from an image. First, we project the 2D image into a 3D representation using monocular depth estimation. We then extract a depth-infused saliency module map, which highlights the most salient (\textit{attention-grabbing}) regions in image for the subject in consideration. We also extract face and depth modalities from the image, and finally fuse all the extracted modalities to identify the gaze target. We quantitatively evaluated our method, including the ablation analysis on three publicly available datasets, namely VideoAttentionTarget, GazeFollow and GOO-Real, and showed that it outperforms other state-of-the-art methods. This suggests that our method is a promising new approach for GTD.
Abstract:Traffic congestion has been a major challenge in many urban road networks. Extensive research studies have been conducted to highlight traffic-related congestion and address the issue using data-driven approaches. Currently, most traffic congestion analyses are done using simulation software that offers limited insight due to the limitations in the tools and utilities being used to render various traffic congestion scenarios. All that impacts the formulation of custom business problems which vary from place to place and country to country. By exploiting the power of the knowledge graph, we model a traffic congestion problem into the Neo4j graph and then use the load balancing, optimization algorithm to identify congestion-free road networks. We also show how traffic propagates backward in case of congestion or accident scenarios and its overall impact on other segments of the roads. We also train a sequential RNN-LSTM (Long Short-Term Memory) deep learning model on the real-time traffic data to assess the accuracy of simulation results based on a road-specific congestion. Our results show that graph-based traffic simulation, supplemented by AI ML-based traffic prediction can be more effective in estimating the congestion level in a road network.
Abstract:Can a neural network estimate an object's dimension in the wild? In this paper, we propose a method and deep learning architecture to estimate the dimensions of a quadrilateral object of interest in videos using a monocular camera. The proposed technique does not use camera calibration or handcrafted geometric features; however, features are learned with the help of coefficients of a segmentation neural network during the training process. A real-time instance segmentation-based Deep Neural Network with a ResNet50 backbone is employed, giving the object's prototype mask and thus provides a region of interest to regress its dimensions. The instance segmentation network is trained to look at only the nearest object of interest. The regression is performed using an MLP head which looks only at the mask coefficients of the bounding box detector head and the prototype segmentation mask. We trained the system with three different random cameras achieving 22% MAPE for the test dataset for the dimension estimation
Abstract:The fact that almost every person owns a smartphone device that can be precisely located is both empowering and worrying. If methods for accurate tracking of devices (and their owners) via WiFi probing are developed in a responsible way, they could be applied in many different fields, from data security to urban planning. Numerous approaches to data collection and analysis have been covered, some of which use active sensing equipment, while others rely on passive probing, which takes advantage of nearly universal smartphone usage and WiFi network coverage. In this study, we introduce a system that uses WiFi probing technologies aimed at tracking user locations and understanding individual behavior. We built our own devices to passively capture WiFi request probe packets from smartphones, without the phones being connected to the network. The devices were tested at the headquarters of the research sector of the Elm Company. The results of the analyses carried out to estimate the crowd density in offices and the flows of the crowd from one place to another are promising and illustrate the importance of such solutions in indoor and closed spaces.
Abstract:The problem of automated car damage assessment presents a major challenge in the auto repair and damage assessment industry. The domain has several application areas ranging from car assessment companies such as car rentals and body shops to accidental damage assessment for car insurance companies. In vehicle assessment, the damage can take any form including scratches, minor and major dents to missing parts. More often, the assessment area has a significant level of noise such as dirt, grease, oil or rush that makes an accurate identification challenging. Moreover, the identification of a particular part is the first step in the repair industry to have an accurate labour and part assessment where the presence of different car models, shapes and sizes makes the task even more challenging for a machine-learning model to perform well. To address these challenges, this research explores and applies various instance segmentation methodologies to evaluate the best performing models. The scope of this work focusses on two genres of real-time instance segmentation models due to their industrial significance, namely SipMask and Yolact. These methodologies are evaluated against a previously reported car parts dataset (DSMLR) and an internally curated dataset extracted from local car repair workshops. The Yolact-based part localization and segmentation method performed well when compared to other real-time instance mechanisms with a mAP of 66.5. For the workshop repair dataset, SipMask++ reported better accuracies for object detection with a mAP of 57.0 with outcomes for AP_IoU=.50and AP_IoU=.75 reporting 72.0 and 67.0 respectively while Yolact was found to be a better performer for AP_s with 44.0 and 2.6 for object detection and segmentation categories respectively.