Abstract:The rapid growth of the global population, alongside exponential technological advancement, has intensified the demand for food production. Meeting this demand depends not only on increasing agricultural yield but also on minimizing food loss caused by crop diseases. Diseases account for a substantial portion of apple production losses, despite apples being among the most widely produced and nutritionally valuable fruits worldwide. Previous studies have employed machine learning techniques for feature extraction and early diagnosis of apple leaf diseases, and more recently, deep learning-based models have shown remarkable performance in disease recognition. However, most state-of-the-art deep learning models are highly parameter-intensive, resulting in increased training and inference time. Although lightweight models are more suitable for user-friendly and resource-constrained applications, they often suffer from performance degradation. To address the trade-off between efficiency and performance, we propose Mam-App, a parameter-efficient Mamba-based model for feature extraction and leaf disease classification. The proposed approach achieves competitive state-of-the-art performance on the PlantVillage Apple Leaf Disease dataset, attaining 99.58% accuracy, 99.30% precision, 99.14% recall, and a 99.22% F1-score, while using only 0.051M parameters. This extremely low parameter count makes the model suitable for deployment on drones, mobile devices, and other low-resource platforms. To demonstrate the robustness and generalizability of the proposed model, we further evaluate it on the PlantVillage Corn Leaf Disease and Potato Leaf Disease datasets. The model achieves 99.48%, 99.20%, 99.34%, and 99.27% accuracy, precision, recall, and F1-score on the corn dataset and 98.46%, 98.91%, 95.39%, and 97.01% on the potato dataset, respectively.
Abstract:With the rapid advancements in machine learning, models have become increasingly capable of learning and making predictions in various industries. However, deploying these models in critical infrastructures presents a major challenge, as concerns about data privacy prevent unrestricted data sharing. Homomorphic encryption (HE) offers a solution by enabling computations on encrypted data, but it remains incompatible with machine learning models like convolutional neural networks (CNNs), due to their reliance on non-linear activation functions. To bridge this gap, this work proposes an optimized framework that replaces standard non-linear functions with homomorphically compatible approximations, ensuring secure computations while minimizing computational overhead. The proposed approach restructures the CNN architecture and introduces an efficient activation function approximation method to mitigate the performance trade-offs introduced by encryption. Experiments on CIFAR-10 achieve 94.4% accuracy with 2.42 s per single encrypted sample and 24,000 s per 10,000 encrypted samples, using a degree-4 polynomial and Softplus activation under CKKS, balancing accuracy and privacy.
Abstract:Sign Language Recognition (SLR) involves the automatic identification and classification of sign gestures from images or video, converting them into text or speech to improve accessibility for the hearing-impaired community. In Bangladesh, Bangla Sign Language (BdSL) serves as the primary mode of communication for many individuals with hearing impairments. This study fine-tunes state-of-the-art video transformer architectures -- VideoMAE, ViViT, and TimeSformer -- on BdSLW60 (arXiv:2402.08635), a small-scale BdSL dataset with 60 frequent signs. We standardized the videos to 30 FPS, resulting in 9,307 user trial clips. To evaluate scalability and robustness, the models were also fine-tuned on BdSLW401 (arXiv:2503.02360), a large-scale dataset with 401 sign classes. Additionally, we benchmark performance against public datasets, including LSA64 and WLASL. Data augmentation techniques such as random cropping, horizontal flipping, and short-side scaling were applied to improve model robustness. To ensure balanced evaluation across folds during model selection, we employed 10-fold stratified cross-validation on the training set, while signer-independent evaluation was carried out using held-out test data from unseen users U4 and U8. Results show that video transformer models significantly outperform traditional machine learning and deep learning approaches. Performance is influenced by factors such as dataset size, video quality, frame distribution, frame rate, and model architecture. Among the models, the VideoMAE variant (MCG-NJU/videomae-base-finetuned-kinetics) achieved the highest accuracies of 95.5% on the frame rate corrected BdSLW60 dataset and 81.04% on the front-facing signs of BdSLW401 -- demonstrating strong potential for scalable and accurate BdSL recognition.




Abstract:In the era of data-driven decision-making, ensuring the privacy and security of shared data is paramount across various domains. Applying existing deep neural networks (DNNs) to encrypted data is critical and often compromises performance, security, and computational overhead. To address these limitations, this research introduces a secure framework consisting of a learnable encryption method based on the block-pixel operation to encrypt the data and subsequently integrate it with the Vision Transformer (ViT). The proposed framework ensures data privacy and security by creating unique scrambling patterns per key, providing robust performance against adversarial attacks without compromising computational efficiency and data integrity. The framework was tested on sensitive medical datasets to validate its efficacy, proving its ability to handle highly confidential information securely. The suggested framework was validated with a 94\% success rate after extensive testing on real-world datasets, such as MRI brain tumors and histological scans of lung and colon cancers. Additionally, the framework was tested under diverse adversarial attempts against secure data sharing with optimum performance and demonstrated its effectiveness in various threat scenarios. These comprehensive analyses underscore its robustness, making it a trustworthy solution for secure data sharing in critical applications.



Abstract:Privacy-preserving and secure data sharing are critical for medical image analysis while maintaining accuracy and minimizing computational overhead are also crucial. Applying existing deep neural networks (DNNs) to encrypted medical data is not always easy and often compromises performance and security. To address these limitations, this research introduces a secure framework consisting of a learnable encryption method based on the block-pixel operation to encrypt the data and subsequently integrate it with the Vision Transformer (ViT). The proposed framework ensures data privacy and security by creating unique scrambling patterns per key, providing robust performance against leading bit attacks and minimum difference attacks.




Abstract:Intrusion detection has been a commonly adopted detective security measures to safeguard systems and networks from various threats. A robust intrusion detection system (IDS) can essentially mitigate threats by providing alerts. In networks based IDS, typically we deal with cyber threats like distributed denial of service (DDoS), spoofing, reconnaissance, brute-force, botnets, and so on. In order to detect these threats various machine learning (ML) and deep learning (DL) models have been proposed. However, one of the key challenges with these predictive approaches is the presence of false positive (FP) and false negative (FN) instances. This FPs and FNs within any black-box intrusion detection system (IDS) make the decision-making task of an analyst further complicated. In this paper, we propose an explainable artificial intelligence (XAI) based visual analysis approach using overlapping SHAP plots that presents the feature explanation to identify potential false positive and false negatives in IDS. Our approach can further provide guidance to security analysts for effective decision-making. We present case study with multiple publicly available network traffic datasets to showcase the efficacy of our approach for identifying false positive and false negative instances. Our use-case scenarios provide clear guidance for analysts on how to use the visual analysis approach for reliable course-of-actions against such threats.




Abstract:Healthcare industries face challenges when experiencing rare diseases due to limited samples. Artificial Intelligence (AI) communities overcome this situation to create synthetic data which is an ethical and privacy issue in the medical domain. This research introduces the CAT-U-Net framework as a new approach to overcome these limitations, which enhances feature extraction from medical images without the need for large datasets. The proposed framework adds an extra concatenation layer with downsampling parts, thereby improving its ability to learn from limited data while maintaining patient privacy. To validate, the proposed framework's robustness, different medical conditioning datasets were utilized including COVID-19, brain tumors, and wrist fractures. The framework achieved nearly 98% reconstruction accuracy, with a Dice coefficient close to 0.946. The proposed CAT-U-Net has the potential to make a big difference in medical image diagnostics in settings with limited data.
Abstract:In the dynamic and ever-changing domain of Unmanned Aerial Vehicles (UAVs), the utmost importance lies in guaranteeing resilient and lucid security measures. This study highlights the necessity of implementing a Zero Trust Architecture (ZTA) to enhance the security of unmanned aerial vehicles (UAVs), hence departing from conventional perimeter defences that may expose vulnerabilities. The Zero Trust Architecture (ZTA) paradigm requires a rigorous and continuous process of authenticating all network entities and communications. The accuracy of our methodology in detecting and identifying unmanned aerial vehicles (UAVs) is 84.59\%. This is achieved by utilizing Radio Frequency (RF) signals within a Deep Learning framework, a unique method. Precise identification is crucial in Zero Trust Architecture (ZTA), as it determines network access. In addition, the use of eXplainable Artificial Intelligence (XAI) tools such as SHapley Additive exPlanations (SHAP) and Local Interpretable Model-agnostic Explanations (LIME) contributes to the improvement of the model's transparency and interpretability. Adherence to Zero Trust Architecture (ZTA) standards guarantees that the classifications of unmanned aerial vehicles (UAVs) are verifiable and comprehensible, enhancing security within the UAV field.




Abstract:The expanding role of Artificial Intelligence (AI) in diverse engineering domains highlights the challenges associated with deploying AI models in new operational environments, involving substantial investments in data collection and model training. Rapid application of AI necessitates evaluating the feasibility of utilizing pre-trained models in unobserved operational settings with minimal or no additional data. However, interpreting the opaque nature of AI's black-box models remains a persistent challenge. Addressing this issue, this paper proposes a science-based certification methodology to assess the viability of employing pre-trained data-driven models in untrained operational environments. The methodology advocates a profound integration of domain knowledge, leveraging theoretical and analytical models from physics and related disciplines, with data-driven AI models. This novel approach introduces tools to facilitate the development of secure engineering systems, providing decision-makers with confidence in the trustworthiness and safety of AI-based models across diverse environments characterized by limited training data and dynamic, uncertain conditions. The paper demonstrates the efficacy of this methodology in real-world safety-critical scenarios, particularly in the context of traffic state estimation. Through simulation results, the study illustrates how the proposed methodology efficiently quantifies physical inconsistencies exhibited by pre-trained AI models. By utilizing analytical models, the methodology offers a means to gauge the applicability of pre-trained AI models in new operational environments. This research contributes to advancing the understanding and deployment of AI models, offering a robust certification framework that enhances confidence in their reliability and safety across a spectrum of operational conditions.
Abstract:Crashes and delays at Railroad Highway Grade Crossings (RHGC), where highways and railroads intersect, pose significant safety concerns for the U.S. Federal Railroad Administration (FRA). Despite the critical importance of addressing accidents and traffic delays at highway-railroad intersections, there is a notable dearth of research on practical solutions for managing these issues. In response to this gap in the literature, our study introduces an intelligent system that leverages machine learning and computer vision techniques to enhance safety at Railroad Highway Grade crossings (RHGC). This research proposed a Non-Maximum Suppression (NMS)- based ensemble model that integrates a variety of YOLO variants, specifically YOLOv5S, YOLOv5M, and YOLOv5L, for grade-crossing object detection, utilizes segmentation techniques from the UNet architecture for detecting approaching rail at a grade crossing. Both methods are implemented on a Raspberry Pi. Moreover, the strategy employs high-definition cameras installed at the RHGC. This framework enables the system to monitor objects within the Region of Interest (ROI) at crossings, detect the approach of trains, and clear the crossing area before a train arrives. Regarding accuracy, precision, recall, and Intersection over Union (IoU), the proposed state-of-the-art NMS-based object detection ensemble model achieved 96% precision. In addition, the UNet segmentation model obtained a 98% IoU value. This automated railroad grade crossing system powered by artificial intelligence represents a promising solution for enhancing safety at highway-railroad intersections.