Abstract:AI image generators create both photorealistic images and stylized art, necessitating robust detectors that maintain performance under common post-processing transformations (JPEG compression, blur, downscaling). Existing methods optimize single metrics without addressing deployment-critical factors such as operating point selection and fixed-threshold robustness. This work addresses misleading robustness estimates by introducing a fixed-threshold evaluation protocol that holds decision thresholds, selected once on clean validation data, fixed across all post-processing transformations. Traditional methods retune thresholds per condition, artificially inflating robustness estimates and masking deployment failures. We report deployment-relevant performance at three operating points (Low-FPR, ROC-optimal, Best-F1) under systematic degradation testing using a lightweight CNN-ViT hybrid with gated fusion and optional frequency enhancement. Our evaluation exposes a statistically validated forensic-semantic spectrum: frequency-aided CNNs excel on pristine photos but collapse under compression (93.33% to 61.49%), whereas ViTs degrade minimally (92.86% to 88.36%) through robust semantic pattern recognition. Multi-seed experiments demonstrate that all architectures achieve 15% higher AUROC on artistic content (0.901-0.907) versus photorealistic images (0.747-0.759), confirming that semantic patterns provide fundamentally more reliable detection cues than forensic artifacts. Our hybrid approach achieves balanced cross-domain performance: 91.4% accuracy on tiny-genimage photos, 89.7% on AiArtData art/graphics, and 98.3% (competitive) on CIFAKE. Fixed-threshold evaluation eliminates retuning inflation, reveals genuine robustness gaps, and yields actionable deployment guidance: prefer CNNs for clean photo verification, ViTs for compressed content, and hybrids for art/graphics screening.
Abstract:Multimodal chest X-Ray analysis often fine-tunes large vision-language models, which is computationally costly. We study parameter-efficient training (PET) strategies, including frozen encoders, BitFit, LoRA, and adapters for multi-label classification on the Indiana University Chest X-Ray dataset (3,851 image-report pairs; 579 test samples). To mitigate data leakage, we redact pathology terms from reports used as text inputs while retaining clinical context. Under a fixed parameter budget (2.37M parameters, 2.51% of total), all PET variants achieve AUROC between 0.892 and 0.908, outperforming full fine-tuning (0.770 AUROC), which uses 94.3M trainable parameters, a 40x reduction. External validation on CheXpert (224,316 images, 58x larger) confirms scalability: all PET methods achieve >0.69 AUROC with <9% trainable parameters, with Adapter achieving best performance (0.7214 AUROC). Budget-matched comparisons reveal that vision-only models (0.653 AUROC, 1.06M parameters) outperform budget-matched multimodal models (0.641 AUROC, 1.06M parameters), indicating improvements arise primarily from parameter allocation rather than cross-modal synergy. While PET methods show degraded calibration (ECE: 0.29-0.34) compared to simpler models (ECE: 0.049), this represents a tractable limitation addressable through post-hoc calibration methods. These findings demonstrate that frozen encoder strategies provide superior discrimination at substantially reduced computational cost, though calibration correction is essential for clinical deployment.




Abstract:Accurate brain tumor classification in MRI images is critical for timely diagnosis and treatment planning. While deep learning models like ResNet-18, VGG-16 have shown high accuracy, they often come with increased complexity and computational demands. This study presents a comparative analysis of effective yet simple Convolutional Neural Network (CNN) architecture and pre-trained ResNet18, and VGG16 model for brain tumor classification using two publicly available datasets: Br35H:: Brain Tumor Detection 2020 and Brain Tumor MRI Dataset. The custom CNN architecture, despite its lower complexity, demonstrates competitive performance with the pre-trained ResNet18 and VGG16 models. In binary classification tasks, the custom CNN achieved an accuracy of 98.67% on the Br35H dataset and 99.62% on the Brain Tumor MRI Dataset. For multi-class classification, the custom CNN, with a slight architectural modification, achieved an accuracy of 98.09%, on the Brain Tumor MRI Dataset. Comparatively, ResNet18 and VGG16 maintained high performance levels, but the custom CNNs provided a more computationally efficient alternative. Additionally,the custom CNNs were evaluated using few-shot learning (0, 5, 10, 15, 20, 40, and 80 shots) to assess their robustness, achieving notable accuracy improvements with increased shots. This study highlights the potential of well-designed, less complex CNN architectures as effective and computationally efficient alternatives to deeper, pre-trained models for medical imaging tasks, including brain tumor classification. This study underscores the potential of custom CNNs in medical imaging tasks and encourages further exploration in this direction.