Alert button
Picture for Mohamed Afifi

Mohamed Afifi

Alert button

Robust Real-Time Pedestrian Detection on Embedded Devices

Dec 13, 2020
Mohamed Afifi, Yara Ali, Karim Amer, Mahmoud Shaker, Mohamed Elhelw

Figure 1 for Robust Real-Time Pedestrian Detection on Embedded Devices
Figure 2 for Robust Real-Time Pedestrian Detection on Embedded Devices
Figure 3 for Robust Real-Time Pedestrian Detection on Embedded Devices
Figure 4 for Robust Real-Time Pedestrian Detection on Embedded Devices

Detection of pedestrians on embedded devices, such as those on-board of robots and drones, has many applications including road intersection monitoring, security, crowd monitoring and surveillance, to name a few. However, the problem can be challenging due to continuously-changing camera viewpoint and varying object appearances as well as the need for lightweight algorithms suitable for embedded systems. This paper proposes a robust framework for pedestrian detection in many footages. The framework performs fine and coarse detections on different image regions and exploits temporal and spatial characteristics to attain enhanced accuracy and real time performance on embedded boards. The framework uses the Yolo-v3 object detection [1] as its backbone detector and runs on the Nvidia Jetson TX2 embedded board, however other detectors and/or boards can be used as well. The performance of the framework is demonstrated on two established datasets and its achievement of the second place in CVPR 2019 Embedded Real-Time Inference (ERTI) Challenge.

Viaarxiv icon

Multi Projection Fusion for Real-time Semantic Segmentation of 3D LiDAR Point Clouds

Nov 06, 2020
Yara Ali Alnaggar, Mohamed Afifi, Karim Amer, Mohamed Elhelw

Figure 1 for Multi Projection Fusion for Real-time Semantic Segmentation of 3D LiDAR Point Clouds
Figure 2 for Multi Projection Fusion for Real-time Semantic Segmentation of 3D LiDAR Point Clouds
Figure 3 for Multi Projection Fusion for Real-time Semantic Segmentation of 3D LiDAR Point Clouds
Figure 4 for Multi Projection Fusion for Real-time Semantic Segmentation of 3D LiDAR Point Clouds

Semantic segmentation of 3D point cloud data is essential for enhanced high-level perception in autonomous platforms. Furthermore, given the increasing deployment of LiDAR sensors onboard of cars and drones, a special emphasis is also placed on non-computationally intensive algorithms that operate on mobile GPUs. Previous efficient state-of-the-art methods relied on 2D spherical projection of point clouds as input for 2D fully convolutional neural networks to balance the accuracy-speed trade-off. This paper introduces a novel approach for 3D point cloud semantic segmentation that exploits multiple projections of the point cloud to mitigate the loss of information inherent in single projection methods. Our Multi-Projection Fusion (MPF) framework analyzes spherical and bird's-eye view projections using two separate highly-efficient 2D fully convolutional models then combines the segmentation results of both views. The proposed framework is validated on the SemanticKITTI dataset where it achieved a mIoU of 55.5 which is higher than state-of-the-art projection-based methods RangeNet++ and PolarNet while being 1.6x faster than the former and 3.1x faster than the latter.

* Accepted at the 2021 Winter Conference on Applications of Computer Vision (WACV 2021) 
Viaarxiv icon

Robust Real-time Pedestrian Detection in Aerial Imagery on Jetson TX2

May 16, 2019
Mohamed Afifi, Yara Ali, Karim Amer, Mahmoud Shaker, Mohamed ElHelw

Figure 1 for Robust Real-time Pedestrian Detection in Aerial Imagery on Jetson TX2

Detection of pedestrians in aerial imagery captured by drones has many applications including intersection monitoring, patrolling, and surveillance, to name a few. However, the problem is involved due to continuouslychanging camera viewpoint and object appearance as well as the need for lightweight algorithms to run on on-board embedded systems. To address this issue, the paper proposes a framework for pedestrian detection in videos based on the YOLO object detection network [6] while having a high throughput of more than 5 FPS on the Jetson TX2 embedded board. The framework exploits deep learning for robust operation and uses a pre-trained model without the need for any additional training which makes it flexible to apply on different setups with minimum amount of tuning. The method achieves ~81 mAP when applied on a sample video from the Embedded Real-Time Inference (ERTI) Challenge where pedestrians are monitored by a UAV.

Viaarxiv icon