Artificial Intelligence Lab, Department of Computer Systems Engineering, University of Engineering and Applied Sciences
Abstract:Human Activity Recognition (HAR) plays a pivotal role in various applications, including smart surveillance, healthcare, assistive technologies, sports analytics, etc. However, HAR systems still face critical challenges, including high computational costs, redundant features, and limited scalability in real-time scenarios. An optimized hybrid deep learning framework is introduced that integrates a customized InceptionV3, an LSTM architecture, and a novel ensemble-based feature selection strategy. The proposed framework first extracts spatial descriptors using the customized InceptionV3 model, which captures multilevel contextual patterns, region homogeneity, and fine-grained localization cues. The temporal dependencies across frames are then modeled using LSTMs to effectively encode motion dynamics. Finally, an ensemble-based genetic algorithm with Adaptive Dynamic Fitness Sharing and Attention (ADFSA) is employed to select a compact and optimized feature set by dynamically balancing objectives such as accuracy, redundancy, uniqueness, and complexity reduction. Consequently, the selected feature subsets, which are both diverse and discriminative, enable various lightweight machine learning classifiers to achieve accurate and robust HAR in heterogeneous environments. Experimental results on the robust UCF-YouTube dataset, which presents challenges such as occlusion, cluttered backgrounds, motion dynamics, and poor illumination, demonstrate good performance. The proposed approach achieves 99.65% recognition accuracy, reduces features to as few as 7, and enhances inference time. The lightweight and scalable nature of the HAR system supports real-time deployment on edge devices such as Raspberry Pi, enabling practical applications in intelligent, resource-aware environments, including public safety, assistive technology, and autonomous monitoring systems.
Abstract:Brain tumors remain among the most lethal human diseases, where early detection and accurate classification are critical for effective diagnosis and treatment planning. Although deep learning-based computer-aided diagnostic (CADx) systems have shown remarkable progress. However, conventional convolutional neural networks (CNNs) and Transformers face persistent challenges, including high computational cost, sensitivity to minor contrast variations, structural heterogeneity, and texture inconsistencies in MRI data. Therefore, a novel hybrid framework, CE-RS-SBCIT, is introduced, integrating residual and spatial learning-based CNNs with transformer-driven modules. The proposed framework exploits local fine-grained and global contextual cues through four core innovations: (i) a smoothing and boundary-based CNN-integrated Transformer (SBCIT), (ii) tailored residual and spatial learning CNNs, (iii) a channel enhancement (CE) strategy, and (iv) a novel spatial attention mechanism. The developed SBCIT employs stem convolution and contextual interaction transformer blocks with systematic smoothing and boundary operations, enabling efficient global feature modeling. Moreover, Residual and spatial CNNs, enhanced by auxiliary transfer-learned feature maps, enrich the representation space, while the CE module amplifies discriminative channels and mitigates redundancy. Furthermore, the spatial attention mechanism selectively emphasizes subtle contrast and textural variations across tumor classes. Extensive evaluation on challenging MRI datasets from Kaggle and Figshare, encompassing glioma, meningioma, pituitary tumors, and healthy controls, demonstrates superior performance, achieving 98.30% accuracy, 98.08% sensitivity, 98.25% F1-score, and 98.43% precision.
Abstract:The Rate of Penetration (ROP) is crucial for optimizing drilling operations; however, accurately predicting it is hindered by the complex, dynamic, and high-dimensional nature of drilling data. Traditional empirical, physics-based, and basic machine learning models often fail to capture intricate temporal and contextual relationships, resulting in suboptimal predictions and limited real-time utility. To address this gap, we propose a novel hybrid deep learning architecture integrating Long Short-Term Memory (LSTM) networks, Transformer encoders, Time-Series Mixer (TS-Mixer) blocks, and attention mechanisms to synergistically model temporal dependencies, static feature interactions, global context, and dynamic feature importance. Evaluated on a real-world drilling dataset, our model outperformed benchmarks (standalone LSTM, TS-Mixer, and simpler hybrids) with an R-squared score of 0.9988 and a Mean Absolute Percentage Error of 1.447%, as measured by standard regression metrics (R-squared, MAE, RMSE, MAPE). Model interpretability was ensured using SHAP and LIME, while actual vs. predicted curves and bias checks confirmed accuracy and fairness across scenarios. This advanced hybrid approach enables reliable real-time ROP prediction, paving the way for intelligent, cost-effective drilling optimization systems with significant operational impact.
Abstract:Deep Convolutional Neural Networks (CNNs) have significantly advanced deep learning, driving breakthroughs in computer vision, natural language processing, medical diagnosis, object detection, and speech recognition. Architectural innovations including 1D, 2D, and 3D convolutional models, dilated and grouped convolutions, depthwise separable convolutions, and attention mechanisms address domain-specific challenges and enhance feature representation and computational efficiency. Structural refinements such as spatial-channel exploitation, multi-path design, and feature-map enhancement contribute to robust hierarchical feature extraction and improved generalization, particularly through transfer learning. Efficient preprocessing strategies, including Fourier transforms, structured transforms, low-precision computation, and weight compression, optimize inference speed and facilitate deployment in resource-constrained environments. This survey presents a unified taxonomy that classifies CNN architectures based on spatial exploitation, multi-path structures, depth, width, dimensionality expansion, channel boosting, and attention mechanisms. It systematically reviews CNN applications in face recognition, pose estimation, action recognition, text classification, statistical language modeling, disease diagnosis, radiological analysis, cryptocurrency sentiment prediction, 1D data processing, video analysis, and speech recognition. In addition to consolidating architectural advancements, the review highlights emerging learning paradigms such as few-shot, zero-shot, weakly supervised, federated learning frameworks and future research directions include hybrid CNN-transformer models, vision-language integration, generative learning, etc. This review provides a comprehensive perspective on CNN's evolution from 2015 to 2025, outlining key innovations, challenges, and opportunities.
Abstract:Recent advancements in detecting tumors using deep learning on breast ultrasound images (BUSI) have demonstrated significant success. Deep CNNs and vision-transformers (ViTs) have demonstrated individually promising initial performance. However, challenges related to model complexity and contrast, texture, and tumor morphology variations introduce uncertainties that hinder the effectiveness of current methods. This study introduces a novel hybrid framework, CB-Res-RBCMT, combining customized residual CNNs and new ViT components for detailed BUSI cancer analysis. The proposed RBCMT uses stem convolution blocks with CNN Meet Transformer (CMT) blocks, followed by new Regional and boundary (RB) feature extraction operations for capturing contrast and morphological variations. Moreover, the CMT block incorporates global contextual interactions through multi-head attention, enhancing computational efficiency with a lightweight design. Additionally, the customized inverse residual and stem CNNs within the CMT effectively extract local texture information and handle vanishing gradients. Finally, the new channel-boosted (CB) strategy enriches the feature diversity of the limited dataset by combining the original RBCMT channels with transfer learning-based residual CNN-generated maps. These diverse channels are processed through a spatial attention block for optimal pixel selection, reducing redundancy and improving the discrimination of minor contrast and texture variations. The proposed CB-Res-RBCMT achieves an F1-score of 95.57%, accuracy of 95.63%, sensitivity of 96.42%, and precision of 94.79% on the standard harmonized stringent BUSI dataset, outperforming existing ViT and CNN methods. These results demonstrate the versatility of our integrated CNN-Transformer framework in capturing diverse features and delivering superior performance in BUSI cancer diagnosis.
Abstract:Monkeypox (MPox) has emerged as a significant global concern, with cases steadily increasing daily. Conventional detection methods, including polymerase chain reaction (PCR) and manual examination, exhibit challenges of low sensitivity, high cost, and substantial workload. Therefore, deep learning offers an automated solution; however, the datasets include data scarcity, texture, contrast, inter-intra class variability, and similarities with other skin infectious diseases. In this regard, a novel hybrid approach is proposed that integrates the learning capacity of Residual Learning and Spatial Exploitation Convolutional Neural Network (CNN) with a customized Swin Transformer (RS-FME-SwinT) to capture multi-scale global and local correlated features for MPox diagnosis. The proposed RS-FME-SwinT technique employs a transfer learning-based feature map enhancement (FME) technique, integrating the customized SwinT for global information capture, residual blocks for texture extraction, and spatial blocks for local contrast variations. Moreover, incorporating new inverse residual blocks within the proposed SwinT effectively captures local patterns and mitigates vanishing gradients. The proposed RS-FME-SwinT has strong learning potential of diverse features that systematically reduce intra-class MPox variation and enable precise discrimination from other skin diseases. Finally, the proposed RS-FME-SwinT is a holdout cross-validated on a diverse MPox dataset and achieved outperformance on state-of-the-art CNNs and ViTs. The proposed RS-FME-SwinT demonstrates commendable results of an accuracy of 97.80%, sensitivity of 96.82%, precision of 98.06%, and an F-score of 97.44% in MPox detection. The RS-FME-SwinT could be a valuable tool for healthcare practitioners, enabling prompt and accurate MPox diagnosis and contributing significantly to mitigation efforts.
Abstract:Medical image segmentation plays a crucial role in various healthcare applications, enabling accurate diagnosis, treatment planning, and disease monitoring. In recent years, Vision Transformers (ViTs) have emerged as a promising technique for addressing the challenges in medical image segmentation. In medical images, structures are usually highly interconnected and globally distributed. ViTs utilize their multi-scale attention mechanism to model the long-range relationships in the images. However, they do lack image-related inductive bias and translational invariance, potentially impacting their performance. Recently, researchers have come up with various ViT-based approaches that incorporate CNNs in their architectures, known as Hybrid Vision Transformers (HVTs) to capture local correlation in addition to the global information in the images. This survey paper provides a detailed review of the recent advancements in ViTs and HVTs for medical image segmentation. Along with the categorization of ViT and HVT-based medical image segmentation approaches we also present a detailed overview of their real-time applications in several medical image modalities. This survey may serve as a valuable resource for researchers, healthcare practitioners, and students in understanding the state-of-the-art approaches for ViT-based medical image segmentation.
Abstract:Monkeypox is a zoonotic infectious disease induced by the Monkeypox virus, part of the poxviridae orthopoxvirus group initially discovered in Africa and gained global attention in mid-2022 with cases reported outside endemic areas. Symptoms include headaches, chills, fever, smallpox, measles, and chickenpox-like skin manifestations and the WHO officially announced monkeypox as a global public health pandemic, in July-2022. Timely diagnosis is imperative for assessing disease severity, conducting clinical evaluations, and determining suitable treatment plans. Traditionally, PCR testing of skin lesions is considered a benchmark for the primary diagnosis by WHO, with symptom management as the primary treatment and antiviral drugs like tecovirimat for severe cases. However, manual analysis within hospitals poses a substantial challenge during public health emergencies, particularly in the case of epidemics and pandemics. Therefore, this survey paper provides an extensive and efficient analysis of deep learning (DL) methods for the automatic detection of MP in skin lesion images. These DL techniques are broadly grouped into categories, including deep CNN, Deep CNNs ensemble, deep hybrid learning, the newly developed, and Vision transformer for diagnosing MP. Additionally, the paper addresses benchmark datasets and their collection from various authentic sources, pre-processing techniques, and evaluation metrics. The survey also briefly delves into emerging concepts, identifies research gaps, limitations, and applications, and outlines challenges in the diagnosis process. This survey furnishes valuable insights into the prospective areas of DL study and is anticipated to serve as a path for researchers.
Abstract:COVID-19 is a new pathogen that first appeared in the human population at the end of 2019, and it can lead to novel variants of pneumonia after infection. COVID-19 is a rapidly spreading infectious disease that infects humans faster. Therefore, efficient diagnostic systems may accurately identify infected patients and thus help control their spread. In this regard, a new two-stage analysis framework is developed to analyze minute irregularities of COVID-19 infection. A novel detection Convolutional Neural Network (CNN), STM-BRNet, is developed that incorporates the Split-Transform-Merge (STM) block and channel boosting (CB) to identify COVID-19 infected CT slices in the first stage. Each STM block extracts boundary and region-smoothing-specific features for COVID-19 infection detection. Moreover, the various boosted channels are obtained by introducing the new CB and Transfer Learning (TL) concept in STM blocks to capture small illumination and texture variations of COVID-19-specific images. The COVID-19 CTs are provided with new SA-CB-BRSeg segmentation CNN for delineating infection in images in the second stage. SA-CB-BRSeg methodically utilized smoothening and heterogeneous operations in the encoder and decoder to capture simultaneously COVID-19 specific patterns that are region homogeneity, texture variation, and boundaries. Additionally, the new CB concept is introduced in the decoder of SA-CB-BRSeg by combining additional channels using TL to learn the low contrast region. The proposed STM-BRNet and SA-CB-BRSeg yield considerable achievement in accuracy: 98.01 %, Recall: 98.12%, F-score: 98.11%, and Dice Similarity: 96.396%, IOU: 98.845 % for the COVID-19 infectious region, respectively. The proposed two-stage framework significantly increased performance compared to single-phase and other reported systems and reduced the burden on the radiologists.
Abstract:Security issues are threatened in various types of networks, especially in the Internet of Things (IoT) environment that requires early detection. IoT is the network of real-time devices like home automation systems and can be controlled by open-source android devices, which can be an open ground for attackers. Attackers can access the network, initiate a different kind of security breach, and compromises network control. Therefore, timely detecting the increasing number of sophisticated malware attacks is the challenge to ensure the credibility of network protection. In this regard, we have developed a new malware detection framework, Deep Squeezed-Boosted and Ensemble Learning (DSBEL), comprised of novel Squeezed-Boosted Boundary-Region Split-Transform-Merge (SB-BR-STM) CNN and ensemble learning. The proposed S.T.M. block employs multi-path dilated convolutional, Boundary, and regional operations to capture the homogenous and heterogeneous global malicious patterns. Moreover, diverse feature maps are achieved using transfer learning and multi-path-based squeezing and boosting at initial and final levels to learn minute pattern variations. Finally, the boosted discriminative features are extracted from the developed deep SB-BR-STM CNN and provided to the ensemble classifiers (SVM, M.L.P., and AdaboostM1) to improve the hybrid learning generalization. The performance analysis of the proposed DSBEL framework and SB-BR-STM CNN against the existing techniques have been evaluated by the IOT_Malware dataset on standard performance measures. Evaluation results show progressive performance as 98.50% accuracy, 97.12% F1-Score, 91.91% MCC, 95.97 % Recall, and 98.42 % Precision. The proposed malware analysis framework is helpful for the timely detection of malicious activity and suggests future strategies.