Abstract:This study aims to address the growing challenge of distinguishing computer-generated imagery (CGI) from authentic digital images across three different color spaces; RGB, YCbCr, and HSV. Given the limitations of existing classification methods in handling the complexity and variability of CGI, this research proposes a Swin Transformer based model for accurate differentiation between natural and synthetic images. The proposed model leverages the Swin Transformer's hierarchical architecture to capture local and global features for distinguishing CGI from natural images. Its performance was assessed through intra- and inter-dataset testing across three datasets: CiFAKE, JSSSTU, and Columbia. The model was evaluated individually on each dataset (D1, D2, D3) and on the combined datasets (D1+D2+D3) to test its robustness and domain generalization. To address dataset imbalance, data augmentation techniques were applied. Additionally, t-SNE visualization was used to demonstrate the feature separability achieved by the Swin Transformer across the selected color spaces. The model's performance was tested across all color schemes, with the RGB color scheme yielding the highest accuracy for each dataset. As a result, RGB was selected for domain generalization analysis and compared with other CNN-based models, VGG-19 and ResNet-50. The comparative results demonstrate the proposed model's effectiveness in detecting CGI, highlighting its robustness and reliability in both intra-dataset and inter-dataset evaluations. The findings of this study highlight the Swin Transformer model's potential as an advanced tool for digital image forensics, particularly in distinguishing CGI from natural images. The model's strong performance indicates its capability for domain generalization, making it a valuable asset in scenarios requiring precise and reliable image classification.
Abstract:This paper presents a deep learning framework for the multi-class classification of gastrointestinal abnormalities in Video Capsule Endoscopy (VCE) frames. The aim is to automate the identification of ten GI abnormality classes, including angioectasia, bleeding, and ulcers, thereby reducing the diagnostic burden on gastroenterologists. Utilizing an ensemble of DenseNet and ResNet architectures, the proposed model achieves an overall accuracy of 94\% across a well-structured dataset. Precision scores range from 0.56 for erythema to 1.00 for worms, with recall rates peaking at 98% for normal findings. This study emphasizes the importance of robust data preprocessing techniques, including normalization and augmentation, in enhancing model performance. The contributions of this work lie in developing an effective AI-driven tool that streamlines the diagnostic process in gastroenterology, ultimately improving patient care and clinical outcomes.
Abstract:The rapid advancements in computer graphics have greatly enhanced the quality of computer-generated images (CGI), making them increasingly indistinguishable from authentic images captured by digital cameras (ADI). This indistinguishability poses significant challenges, especially in an era of widespread misinformation and digitally fabricated content. This research proposes a novel approach to classify CGI and ADI using Swin Transformers and preprocessing techniques involving RGB and CbCrY color frame analysis. By harnessing the capabilities of Swin Transformers, our method foregoes handcrafted features instead of relying on raw pixel data for model training. This approach achieves state-of-the-art accuracy while offering substantial improvements in processing speed and robustness against joint image manipulations such as noise addition, blurring, and JPEG compression. Our findings highlight the potential of Swin Transformers combined with advanced color frame analysis for effective and efficient image authenticity detection.
Abstract:\textbf{Purpose} This study aims to address the growing challenge of distinguishing computer-generated imagery (CGI) from authentic digital images in the RGB color space. Given the limitations of existing classification methods in handling the complexity and variability of CGI, this research proposes a Swin Transformer-based model for accurate differentiation between natural and synthetic images. \textbf{Methods} The proposed model leverages the Swin Transformer's hierarchical architecture to capture local and global features crucial for distinguishing CGI from natural images. The model's performance was evaluated through intra-dataset and inter-dataset testing across three distinct datasets: CiFAKE, JSSSTU, and Columbia. The datasets were tested individually (D1, D2, D3) and in combination (D1+D2+D3) to assess the model's robustness and domain generalization capabilities. \textbf{Results} The Swin Transformer-based model demonstrated high accuracy, consistently achieving a range of 97-99\% across all datasets and testing scenarios. These results confirm the model's effectiveness in detecting CGI, showcasing its robustness and reliability in both intra-dataset and inter-dataset evaluations. \textbf{Conclusion} The findings of this study highlight the Swin Transformer model's potential as an advanced tool for digital image forensics, particularly in distinguishing CGI from natural images. The model's strong performance across multiple datasets indicates its capability for domain generalization, making it a valuable asset in scenarios requiring precise and reliable image classification.
Abstract:In recent years, the focus is on improving the diagnosis of diabetic retinopathy (DR) using machine learning and deep learning technologies. Researchers have explored various approaches, including the use of high-definition medical imaging, AI-driven algorithms such as convolutional neural networks (CNNs) and generative adversarial networks (GANs). Among all the available tools, CNNs have emerged as a preferred tool due to their superior classification accuracy and efficiency. Although the accuracy of CNNs is comparatively better but it can be improved by introducing some hybrid models by combining various machine learning and deep learning models. Therefore, in this paper, an ensemble learning technique is proposed for early detection and management of DR with higher accuracy. The proposed model is tested on the APTOS dataset and it is showing supremacy on the validation accuracy ($99\%)$ in comparison to the previous models. Hence, the model can be helpful for early detection and treatment of the DR, thereby enhancing the overall quality of care for affected individuals.
Abstract:An increasing number of classification approaches have been developed to address the issue of image rebroadcast and recapturing, a standard attack strategy in insurance frauds, face spoofing, and video piracy. However, most of them neglected scale variations and domain generalization scenarios, performing poorly in instances involving domain shifts, typically made worse by inter-domain and cross-domain scale variances. To overcome these issues, we propose a cascaded data augmentation and SWIN transformer domain generalization framework (DAST-DG) in the current research work Initially, we examine the disparity in dataset representation. A feature generator is trained to make authentic images from various domains indistinguishable. This process is then applied to recaptured images, creating a dual adversarial learning setup. Extensive experiments demonstrate that our approach is practical and surpasses state-of-the-art methods across different databases. Our model achieves an accuracy of approximately 82\% with a precision of 95\% on high-variance datasets.