Abstract:Recent progress in image-based medical disease detection encounters challenges such as limited annotated data sets, inadequate spatial feature analysis, data security issues, and inefficient training frameworks. This study introduces a data-efficient image transformer (DeIT)-based approach that overcomes these challenges by utilizing multiscale patch embedding for better feature extraction and stratified weighted random sampling to address class imbalance. The model also incorporates a LoRA-enhanced transformer encoder, a distillation framework, and federated learning for decentralized training, improving both efficiency and data security. Consequently, it achieves state-of-the-art performance, with the highest AUC, F1 score, precision, minimal loss, and Top-5 accuracy. Additionally, Grad-CAM++ visualizations improve interpretability by highlighting critical pathological regions, enhancing the model's clinical relevance. These results highlight the potential of this approach to advance AI-powered medical imaging and disease detection.
Abstract:Recent advancements in text-to-image diffusion models are hindered by high computational demands, limiting accessibility and scalability. This paper introduces KDC-Diff, a novel stable diffusion framework that enhances efficiency while maintaining image quality. KDC-Diff features a streamlined U-Net architecture with nearly half the parameters of the original U-Net (482M), significantly reducing model complexity. We propose a dual-layered distillation strategy to ensure high-fidelity generation, transferring semantic and structural insights from a teacher to a compact student model while minimizing quality degradation. Additionally, replay-based continual learning is integrated to mitigate catastrophic forgetting, allowing the model to retain prior knowledge while adapting to new data. Despite operating under extremely low computational resources, KDC-Diff achieves state-of-the-art performance on the Oxford Flowers and Butterflies & Moths 100 Species datasets, demonstrating competitive metrics such as FID, CLIP, and LPIPS. Moreover, it significantly reduces inference time compared to existing models. These results establish KDC-Diff as a highly efficient and adaptable solution for text-to-image generation, particularly in computationally constrained environments.