The Red Palm Weevil (RPW), also known as the palm weevil, is considered among the world's most damaging insect pests of palms. Current detection techniques include the detection of symptoms of RPW using visual or sound inspection and chemical detection of volatile signatures generated by infested palm trees. However, efficient detection of RPW diseases at an early stage is considered one of the most challenging issues for cultivating date palms. In this paper, an efficient approach to the early detection of RPW is proposed. The proposed approach is based on RPW sound activities being recorded and analyzed. The first step involves the conversion of sound data into images based on a selected set of features. The second step involves the combination of images from the same sound file but computed by different features into a single image. The third step involves the application of different Deep Learning (DL) techniques to classify resulting images into two classes: infested and not infested. Experimental results show good performances of the proposed approach for RPW detection using different DL techniques, namely MobileNetV2, ResNet50V2, ResNet152V2, VGG16, VGG19, DenseNet121, DenseNet201, Xception, and InceptionV3. The proposed approach outperformed existing techniques for public datasets.
In this paper, we argue that the next generation of robots can be commanded using only Language Models' prompts. Every prompt interrogates separately a specific Robotic Modality via its Modality Language Model (MLM). A central Task Modality mediates the whole communication to execute the robotic mission via a Large Language Model (LLM). This paper gives this new robotic design pattern the name of: Prompting Robotic Modalities (PRM). Moreover, this paper applies this PRM design pattern in building a new robotic framework named ROSGPT_Vision. ROSGPT_Vision allows the execution of a robotic task using only two prompts: a Visual and an LLM prompt. The Visual Prompt extracts, in natural language, the visual semantic features related to the task under consideration (Visual Robotic Modality). Meanwhile, the LLM Prompt regulates the robotic reaction to the visual description (Task Modality). The framework automates all the mechanisms behind these two prompts. The framework enables the robot to address complex real-world scenarios by processing visual data, making informed decisions, and carrying out actions automatically. The framework comprises one generic vision module and two independent ROS nodes. As a test application, we used ROSGPT_Vision to develop CarMate, which monitors the driver's distraction on the roads and makes real-time vocal notifications to the driver. We showed how ROSGPT_Vision significantly reduced the development cost compared to traditional methods. We demonstrated how to improve the quality of the application by optimizing the prompting strategies, without delving into technical details. ROSGPT_Vision is shared with the community (link: https://github.com/bilel-bj/ROSGPT_Vision) to advance robotic research in this direction and to build more robotic frameworks that implement the PRM design pattern and enables controlling robots using only prompts.
The Red Palm Weevil (RPW) is a highly destructive insect causing economic losses and impacting palm tree farming worldwide. This paper proposes an innovative approach for sustainable palm tree farming by utilizing advanced technologies for the early detection and management of RPW. Our approach combines computer vision, deep learning (DL), the Internet of Things (IoT), and geospatial data to detect and classify RPW-infested palm trees effectively. The main phases include; (1) DL classification using sound data from IoT devices, (2) palm tree detection using YOLOv8 on UAV images, and (3) RPW mapping using geospatial data. Our custom DL model achieves 100% precision and recall in detecting and localizing infested palm trees. Integrating geospatial data enables the creation of a comprehensive RPW distribution map for efficient monitoring and targeted management strategies. This technology-driven approach benefits agricultural authorities, farmers, and researchers in managing RPW infestations and safeguarding palm tree plantations' productivity.
Medical imaging is an essential tool for diagnosing various healthcare diseases and conditions. However, analyzing medical images is a complex and time-consuming task that requires expertise and experience. This article aims to design a decision support system to assist healthcare providers and patients in making decisions about diagnosing, treating, and managing health conditions. The proposed architecture contains three stages: 1) data collection and labeling, 2) model training, and 3) diagnosis report generation. The key idea is to train a deep learning model on a medical image dataset to extract four types of information: the type of image scan, the body part, the test image, and the results. This information is then fed into ChatGPT to generate automatic diagnostics. The proposed system has the potential to enhance decision-making, reduce costs, and improve the capabilities of healthcare providers. The efficacy of the proposed system is analyzed by conducting extensive experiments on a large medical image dataset. The experimental outcomes exhibited promising performance for automatic diagnosis through medical images.
Since the release of ChatGPT, numerous studies have highlighted the remarkable performance of ChatGPT, which often rivals or even surpasses human capabilities in various tasks and domains. However, this paper presents a contrasting perspective by demonstrating an instance where human performance excels in typical tasks suited for ChatGPT, specifically in the domain of computer programming. We utilize the IEEExtreme Challenge competition as a benchmark, a prestigious, annual international programming contest encompassing a wide range of problems with different complexities. To conduct a thorough evaluation, we selected and executed a diverse set of 102 challenges, drawn from five distinct IEEExtreme editions, using three major programming languages: Python, Java, and C++. Our empirical analysis provides evidence that contrary to popular belief, human programmers maintain a competitive edge over ChatGPT in certain aspects of problem-solving within the programming context. In fact, we found that the average score obtained by ChatGPT on the set of IEEExtreme programming problems is 3.9 to 5.8 times lower than the average human score, depending on the programming language. This paper elaborates on these findings, offering critical insights into the limitations and potential areas of improvement for AI-based language models like ChatGPT.
Image Dehazing aims to remove atmospheric fog or haze from an image. Although the Dehazing models have evolved a lot in recent years, few have precisely tackled the problem of High-Resolution hazy images. For this kind of image, the model needs to work on a downscaled version of the image or on cropped patches from it. In both cases, the accuracy will drop. This is primarily due to the inherent failure to combine global and local features when the image size increases. The Dehazing model requires global features to understand the general scene peculiarities and the local features to work better with fine and pixel details. In this study, we propose the Streamlined Global and Local Features Combinator (SGLC) to solve these issues and to optimize the application of any Dehazing model to High-Resolution images. The SGLC contains two successive blocks. The first is the Global Features Generator (GFG) which generates the first version of the Dehazed image containing strong global features. The second block is the Local Features Enhancer (LFE) which improves the local feature details inside the previously generated image. When tested on the Uformer architecture for Dehazing, SGLC increased the PSNR metric by a significant margin. Any other model can be incorporated inside the SGLC process to improve its efficiency on High-Resolution input data.
Ambient computing is gaining popularity as a major technological advancement for the future. The modern era has witnessed a surge in the advancement in healthcare systems, with viable radio frequency solutions proposed for remote and unobtrusive human activity recognition (HAR). Specifically, this study investigates the use of Wi-Fi channel state information (CSI) as a novel method of ambient sensing that can be employed as a contactless means of recognizing human activity in indoor environments. These methods avoid additional costly hardware required for vision-based systems, which are privacy-intrusive, by (re)using Wi-Fi CSI for various safety and security applications. During an experiment utilizing universal software-defined radio (USRP) to collect CSI samples, it was observed that a subject engaged in six distinct activities, which included no activity, standing, sitting, and leaning forward, across different areas of the room. Additionally, more CSI samples were collected when the subject walked in two different directions. This study presents a Wi-Fi CSI-based HAR system that assesses and contrasts deep learning approaches, namely convolutional neural network (CNN), long short-term memory (LSTM), and hybrid (LSTM+CNN), employed for accurate activity recognition. The experimental results indicate that LSTM surpasses current models and achieves an average accuracy of 95.3% in multi-activity classification when compared to CNN and hybrid techniques. In the future, research needs to study the significance of resilience in diverse and dynamic environments to identify the activity of multiple users.
Smart traffic engineering and intelligent transportation services are in increasing demand from governmental authorities to optimize traffic performance and thus reduce energy costs, increase the drivers' safety and comfort, ensure traffic laws enforcement, and detect traffic violations. In this paper, we address this challenge, and we leverage the use of Artificial Intelligence (AI) and Unmanned Aerial Vehicles (UAVs) to develop an AI-integrated video analytics framework, called TAU (Traffic Analysis from UAVs), for automated traffic analytics and understanding. Unlike previous works on traffic video analytics, we propose an automated object detection and tracking pipeline from video processing to advanced traffic understanding using high-resolution UAV images. TAU combines six main contributions. First, it proposes a pre-processing algorithm to adapt the high-resolution UAV image as input to the object detector without lowering the resolution. This ensures an excellent detection accuracy from high-quality features, particularly the small size of detected objects from UAV images. Second, it introduces an algorithm for recalibrating the vehicle coordinates to ensure that vehicles are uniquely identified and tracked across the multiple crops of the same frame. Third, it presents a speed calculation algorithm based on accumulating information from successive frames. Fourth, TAU counts the number of vehicles per traffic zone based on the Ray Tracing algorithm. Fifth, TAU has a fully independent algorithm for crossroad arbitration based on the data gathered from the different zones surrounding it. Sixth, TAU introduces a set of algorithms for extracting twenty-four types of insights from the raw data collected. The code is shared here: https://github.com/bilel-bj/TAU. Video demonstrations are provided here: https://youtu.be/wXJV0H7LviU and here: https://youtu.be/kGv0gmtVEbI.
We present Covy -- a robotic platform that promotes social distancing during pandemics like COVID-19. Covy features a novel compound vision system that enables it to detect social distancing breaches up to 16m away. Covy navigates its surroundings autonomously using a hybrid navigation stack that combines Deep Reinforcement Learning (DRL)and a probabilistic localization method. We built the complete system and evaluated Covy's performance through extensive sets of experiments both in simulated and realistic environments. Amongst others, our results show that the hybrid navigation stack is more robust compared to a pure DRL-based solution.
With the number of vehicles continuously increasing, parking monitoring and analysis are becoming a substantial feature of modern cities. In this study, we present a methodology to monitor car parking areas and to analyze their occupancy in real-time. The solution is based on a combination between image analysis and deep learning techniques. It incorporates four building blocks put inside a pipeline: vehicle detection, vehicle tracking, manual annotation of parking slots, and occupancy estimation using the Ray Tracing algorithm. The aim of this methodology is to optimize the use of parking areas and to reduce the time wasted by daily drivers to find the right parking slot for their cars. Also, it helps to better manage the space of the parking areas and to discover misuse cases. A demonstration of the provided solution is shown in the following video link: https://www.youtube.com/watch?v=KbAt8zT14Tc.