Abstract:Potholes cause vehicle damage and traffic accidents, creating serious safety and economic problems. Therefore, early and accurate detection of potholes is crucial. Existing detection methods are usually only based on 2D RGB images and cannot accurately analyze the physical characteristics of potholes. In this paper, a publicly available dataset of RGB-D images (PothRGBD) is created and an improved YOLOv8-based model is proposed for both pothole detection and pothole physical features analysis. The Intel RealSense D415 depth camera was used to collect RGB and depth data from the road surfaces, resulting in a PothRGBD dataset of 1000 images. The data was labeled in YOLO format suitable for segmentation. A novel YOLO model is proposed based on the YOLOv8n-seg architecture, which is structurally improved with Dynamic Snake Convolution (DSConv), Simple Attention Module (SimAM) and Gaussian Error Linear Unit (GELU). The proposed model segmented potholes with irregular edge structure more accurately, and performed perimeter and depth measurements on depth maps with high accuracy. The standard YOLOv8n-seg model achieved 91.9% precision, 85.2% recall and 91.9% mAP@50. With the proposed model, the values increased to 93.7%, 90.4% and 93.8% respectively. Thus, an improvement of 1.96% in precision, 6.13% in recall and 2.07% in mAP was achieved. The proposed model performs pothole detection as well as perimeter and depth measurement with high accuracy and is suitable for real-time applications due to its low model complexity. In this way, a lightweight and effective model that can be used in deep learning-based intelligent transportation solutions has been acquired.
Abstract:Hypertensive retinopathy (HR) is a severe eye disease that may cause permanent vision loss if not diagnosed early. Traditional diagnostic methods are time-consuming and subjective, highlighting the need for an automated, reliable system. Existing studies often use a single Deep Learning (DL) model, struggling to distinguish HR stages. This study introduces a three-stage approach to enhance HR diagnosis accuracy. Initially, 14 CNN models were tested, identifying DenseNet169, MobileNet, and ResNet152 as the most effective. DenseNet169 achieved 87.73% accuracy, 87.75% precision, 87.73% recall, 87.67% F1-score, and 0.8359 Cohen's Kappa. MobileNet followed with 86.40% accuracy, 86.60% precision, 86.40% recall, 86.31% F1-score, and 0.8180 Cohen's Kappa. ResNet152 ranked third with 85.87% accuracy, 86.01% precision, 85.87% recall, 85.83% F1-score, and 0.8188 Cohen's Kappa. In the second stage, deep features from these models were fused and classified using Machine Learning (ML) algorithms (SVM, RF, XGBoost). SVM (sigmoid kernel) performed best with 92.00% accuracy, 91.93% precision, 92.00% recall, 91.91% F1-score, and 0.8930 Cohen's Kappa. The third stage applied meta-heuristic optimization (GA, ABC, PSO, HHO) for feature selection. HHO yielded 94.66% accuracy, precision, and recall, 94.64% F1-score, and 0.9286 Cohen's Kappa. The proposed approach surpassed single CNN models and previous studies in HR diagnosis accuracy and generalization.
Abstract:Glaucoma is a prevalent eye disease that progresses silently without symptoms. If not detected and treated early, it can cause permanent vision loss. Computer-assisted diagnosis systems play a crucial role in timely and efficient identification. This study introduces MaxGlaViT, a lightweight model based on the restructured Multi-Axis Vision Transformer (MaxViT) for early glaucoma detection. First, MaxViT was scaled to optimize block and channel numbers, resulting in a lighter architecture. Second, the stem was enhanced by adding attention mechanisms (CBAM, ECA, SE) after convolution layers to improve feature learning. Third, MBConv structures in MaxViT blocks were replaced by advanced DL blocks (ConvNeXt, ConvNeXtV2, InceptionNeXt). The model was evaluated using the HDV1 dataset, containing fundus images of different glaucoma stages. Additionally, 40 CNN and 40 ViT models were tested on HDV1 to validate MaxGlaViT's efficiency. Among CNN models, EfficientB6 achieved the highest accuracy (84.91%), while among ViT models, MaxViT-Tiny performed best (86.42%). The scaled MaxViT reached 87.93% accuracy. Adding ECA to the stem block increased accuracy to 89.01%. Replacing MBConv with ConvNeXtV2 further improved it to 89.87%. Finally, integrating ECA in the stem and ConvNeXtV2 in MaxViT blocks resulted in 92.03% accuracy. Testing 80 DL models for glaucoma stage classification, this study presents a comprehensive and comparative analysis. MaxGlaViT outperforms experimental and state-of-the-art models, achieving 92.03% accuracy, 92.33% precision, 92.03% recall, 92.13% f1-score, and 87.12% Cohen's kappa score.
Abstract:Detecting machine malfunctions at an early stage is crucial for reducing interruptions in operational processes within industrial settings. Recently, the deep learning approach has started to be preferred for the detection of failures in machines. Deep learning provides an effective solution in fault detection processes thanks to automatic feature extraction. In this study, a deep learning-based system was designed to analyze the sound signals produced by industrial machines. Acoustic sound signals were converted into Mel spectrograms. For the purpose of classifying spectrogram images, the DenseNet-169 model, a deep learning architecture recognized for its effectiveness in image classification tasks, was used. The model was trained using the transfer learning method on the MIMII dataset including sounds from four types of industrial machines. The results showed that the proposed method reached an accuracy rate varying between 97.17% and 99.87% at different Sound Noise Rate levels.