Abstract:Image inpainting is currently a hot topic within the field of computer vision. It offers a viable solution for various applications, including photographic restoration, video editing, and medical imaging. Deep learning advancements, notably convolutional neural networks (CNNs) and generative adversarial networks (GANs), have significantly enhanced the inpainting task with an improved capability to fill missing or damaged regions in an image or video through the incorporation of contextually appropriate details. These advancements have improved other aspects, including efficiency, information preservation, and achieving both realistic textures and structures. Recently, visual transformers have been exploited and offer some improvements to image or video inpainting. The advent of transformer-based architectures, which were initially designed for natural language processing, has also been integrated into computer vision tasks. These methods utilize self-attention mechanisms that excel in capturing long-range dependencies within data; therefore, they are particularly effective for tasks requiring a comprehensive understanding of the global context of an image or video. In this paper, we provide a comprehensive review of the current image or video inpainting approaches, with a specific focus on transformer-based techniques, with the goal to highlight the significant improvements and provide a guideline for new researchers in the field of image or video inpainting using visual transformers. We categorized the transformer-based techniques by their architectural configurations, types of damage, and performance metrics. Furthermore, we present an organized synthesis of the current challenges, and suggest directions for future research in the field of image or video inpainting.
Abstract:The Unmanned Aerial Vehicles (UAVs) market has been significantly growing and Considering the availability of drones at low-cost prices the possibility of misusing them, for illegal purposes such as drug trafficking, spying, and terrorist attacks posing high risks to national security, is rising. Therefore, detecting and tracking unauthorized drones to prevent future attacks that threaten lives, facilities, and security, become a necessity. Drone detection can be performed using different sensors, while image-based detection is one of them due to the development of artificial intelligence techniques. However, knowing unauthorized drone types is one of the challenges due to the lack of drone types datasets. For that, in this paper, we provide a dataset of various drones as well as a comparison of recognized object detection models on the proposed dataset including YOLO algorithms with their different versions, like, v3, v4, and v5 along with the Detectronv2. The experimental results of different models are provided along with a description of each method. The collected dataset can be found in https://drive.google.com/drive/folders/1EPOpqlF4vG7hp4MYnfAecVOsdQ2JwBEd?usp=share_link
Abstract:Three-dimensional (3D) point cloud analysis has become one of the attractive subjects in realistic imaging and machine visions due to its simplicity, flexibility and powerful capacity of visualization. Actually, the representation of scenes and buildings using 3D shapes and formats leveraged many applications among which automatic driving, scenes and objects reconstruction, etc. Nevertheless, working with this emerging type of data has been a challenging task for objects representation, scenes recognition, segmentation, and reconstruction. In this regard, a significant effort has recently been devoted to developing novel strategies, using different techniques such as deep learning models. To that end, we present in this paper a comprehensive review of existing tasks on 3D point cloud: a well-defined taxonomy of existing techniques is performed based on the nature of the adopted algorithms, application scenarios, and main objectives. Various tasks performed on 3D point could data are investigated, including objects and scenes detection, recognition, segmentation and reconstruction. In addition, we introduce a list of used datasets, we discuss respective evaluation metrics and we compare the performance of existing solutions to better inform the state-of-the-art and identify their limitations and strengths. Lastly, we elaborate on current challenges facing the subject of technology and future trends attracting considerable interest, which could be a starting point for upcoming research studies
Abstract:To understand the real world using various types of data, Artificial Intelligence (AI) is the most used technique nowadays. While finding the pattern within the analyzed data represents the main task. This is performed by extracting representative features step, which is proceeded using the statistical algorithms or using some specific filters. However, the selection of useful features from large-scale data represented a crucial challenge. Now, with the development of convolution neural networks (CNNs), the feature extraction operation has become more automatic and easier. CNNs allow to work on large-scale size of data, as well as cover different scenarios for a specific task. For computer vision tasks, convolutional networks are used to extract features also for the other parts of a deep learning model. The selection of a suitable network for feature extraction or the other parts of a DL model is not random work. So, the implementation of such a model can be related to the target task as well as the computational complexity of it. Many networks have been proposed and become the famous networks used for any DL models in any AI task. These networks are exploited for feature extraction or at the beginning of any DL model which is named backbones. A backbone is a known network trained in many other tasks before and demonstrates its effectiveness. In this paper, an overview of the existing backbones, e.g. VGGs, ResNets, DenseNet, etc, is given with a detailed description. Also, a couple of computer vision tasks are discussed by providing a review of each task regarding the backbones used. In addition, a comparison in terms of performance is also provided, based on the backbone used for each task.
Abstract:Image segmentation for video analysis plays an essential role in different research fields such as smart city, healthcare, computer vision and geoscience, and remote sensing applications. In this regard, a significant effort has been devoted recently to developing novel segmentation strategies; one of the latest outstanding achievements is panoptic segmentation. The latter has resulted from the fusion of semantic and instance segmentation. Explicitly, panoptic segmentation is currently under study to help gain a more nuanced knowledge of the image scenes for video surveillance, crowd counting, self-autonomous driving, medical image analysis, and a deeper understanding of the scenes in general. To that end, we present in this paper the first comprehensive review of existing panoptic segmentation methods to the best of the authors' knowledge. Accordingly, a well-defined taxonomy of existing panoptic techniques is performed based on the nature of the adopted algorithms, application scenarios, and primary objectives. Moreover, the use of panoptic segmentation for annotating new datasets by pseudo-labeling is discussed. Moving on, ablation studies are carried out to understand the panoptic methods from different perspectives. Moreover, evaluation metrics suitable for panoptic segmentation are discussed, and a comparison of the performance of existing solutions is provided to inform the state-of-the-art and identify their limitations and strengths. Lastly, the current challenges the subject technology faces and the future trends attracting considerable interest in the near future are elaborated, which can be a starting point for the upcoming research studies. The papers provided with code are available at: https://github.com/elharroussomar/Awesome-Panoptic-Segmentation
Abstract:Crowd counting on the drone platform is an interesting topic in computer vision, which brings new challenges such as small object inference, background clutter and wide viewpoint. However, there are few algorithms focusing on crowd counting on the drone-captured data due to the lack of comprehensive datasets. To this end, we collect a large-scale dataset and organize the Vision Meets Drone Crowd Counting Challenge (VisDrone-CC2020) in conjunction with the 16th European Conference on Computer Vision (ECCV 2020) to promote the developments in the related fields. The collected dataset is formed by $3,360$ images, including $2,460$ images for training, and $900$ images for testing. Specifically, we manually annotate persons with points in each video frame. There are $14$ algorithms from $15$ institutes submitted to the VisDrone-CC2020 Challenge. We provide a detailed analysis of the evaluation results and conclude the challenge. More information can be found at the website: \url{http://www.aiskyeye.com/}.
Abstract:The novelty of the COVID-19 disease and the speed of spread has created a colossal chaos, impulse among researchers worldwide to exploit all the resources and capabilities to understand and analyze characteristics of the coronavirus in term of the ways it spreads and virus incubation time. For that, the existing medical features like CT and X-ray images are used. For example, CT-scan images can be used for the detection of lung infection. But the challenges of these features such as the quality of the image and infection characteristics limitate the effectiveness of these features. Using artificial intelligence (AI) tools and computer vision algorithms, the accuracy of detection can be more accurate and can help to overcome these issues. This paper proposes a multi-task deep-learning-based method for lung infection segmentation using CT-scan images. Our proposed method starts by segmenting the lung regions that can be infected. Then, segmenting the infections in these regions. Also, to perform a multi-class segmentation the proposed model is trained using the two-stream inputs. The multi-task learning used in this paper allows us to overcome shortage of labeled data. Also, the multi-input stream allows the model to do the learning on many features that can improve the results. To evaluate the proposed method, many features have been used. Also, from the experiments, the proposed method can segment lung infections with a high degree performance even with shortage of data and labeled images. In addition, comparing with the state-of-the-art method our method achieves good performance results.
Abstract:Although image inpainting, or the art of repairing the old and deteriorated images, has been around for many years, it has gained even more popularity because of the recent development in image processing techniques. With the improvement of image processing tools and the flexibility of digital image editing, automatic image inpainting has found important applications in computer vision and has also become an important and challenging topic of research in image processing. This paper is a brief review of the existing image inpainting approaches we first present a global vision on the existing methods for image inpainting. We attempt to collect most of the existing approaches and classify them into three categories, namely, sequential-based, CNN-based and GAN-based methods. In addition, for each category, a list of methods for the different types of distortion on the images is presented. Furthermore, collect a list of the available datasets and discuss these in our paper. This is a contribution for digital image inpainting researchers trying to look for the available datasets because there is a lack of datasets available for image inpainting. As the final step in this overview, we present the results of real evaluations of the three categories of image inpainting methods performed on the datasets used, for the different types of image distortion. In the end, we also present the evaluations metrics and discuss the performance of these methods in terms of these metrics. This overview can be used as a reference for image inpainting researchers, and it can also facilitate the comparison of the methods as well as the datasets used. The main contribution of this paper is the presentation of the three categories of image inpainting methods along with a list of available datasets that the researchers can use to evaluate their proposed methodology against.
Abstract:In recent years, various attempts have been proposed to explore the use of spatial and temporal information for human action recognition using convolutional neural networks (CNNs). However, only a small number of methods are available for the recognition of many human actions performed by more than one person in the same surveillance video. This paper proposes a novel technique for multiple human action recognition using a new architecture based on 3Dimdenisional deep learning with application to video surveillance systems. The first stage of the model uses a new representation of the data by extracting the sequence of each person acting in the scene. An analysis of each sequence to detect the corresponding actions is also proposed. KTH, Weizmann and UCF-ARG datasets were used for training, new datasets were also constructed which include a number of persons having multiple actions were used for testing the proposed algorithm. The results of this work revealed that the proposed method provides more accurate multi human action recognition achieving 98%. Other videos were used for the evaluation including datasets (UCF101, Hollywood2, HDMB51, and YouTube) without any preprocessing and the results obtained suggest that our proposed method clearly improves the performances when compared to state-of-the-art methods.