Abstract:Vertical Federated Learning (VFL) offers a privacy-preserving paradigm for Edge AI scenarios like mobile health diagnostics, where sensitive multimodal data reside on distributed, resource-constrained devices. Yet, standard VFL systems often suffer performance limitations due to simplistic feature fusion. This paper introduces HybridVFL, a novel framework designed to overcome this bottleneck by employing client-side feature disentanglement paired with a server-side cross-modal transformer for context-aware fusion. Through systematic evaluation on the multimodal HAM10000 skin lesion dataset, we demonstrate that HybridVFL significantly outperforms standard federated baselines, validating the criticality of advanced fusion mechanisms in robust, privacy-preserving systems.
Abstract:In recent years, high-performance computer vision models have achieved remarkable success in medical imaging, with some skin lesion classification systems even surpassing dermatology specialists in diagnostic accuracy. However, such models are computationally intensive and large in size, making them unsuitable for deployment on edge devices. In addition, strict privacy constraints hinder centralized data management, motivating the adoption of Federated Learning (FL). To address these challenges, this study proposes a skewness-guided pruning method that selectively prunes the Multi-Head Self-Attention and Multi-Layer Perceptron layers of a multimodal Swin Transformer based on the statistical skewness of their output distributions. The proposed method was validated in a horizontal FL environment and shown to maintain performance while substantially reducing model complexity. Experiments on the compact Swin Transformer demonstrate approximately 36\% model size reduction with no loss in accuracy. These findings highlight the feasibility of achieving efficient model compression and privacy-preserving distributed learning for multimodal medical AI on edge devices.
Abstract:Skin color has historically been a focal point of discrimination, yet fairness research in machine learning for medical imaging often relies on coarse subgroup categories, overlooking individual-level variations. Such group-based approaches risk obscuring biases faced by outliers within subgroups. This study introduces a distribution-based framework for evaluating and mitigating individual fairness in skin lesion classification. We treat skin tone as a continuous attribute rather than a categorical label, and employ kernel density estimation (KDE) to model its distribution. We further compare twelve statistical distance metrics to quantify disparities between skin tone distributions and propose a distance-based reweighting (DRW) loss function to correct underrepresentation in minority tones. Experiments across CNN and Transformer models demonstrate: (i) the limitations of categorical reweighting in capturing individual-level disparities, and (ii) the superior performance of distribution-based reweighting, particularly with Fidelity Similarity (FS), Wasserstein Distance (WD), Hellinger Metric (HM), and Harmonic Mean Similarity (HS). These findings establish a robust methodology for advancing fairness at individual level in dermatological AI systems, and highlight broader implications for sensitive continuous attributes in medical image analysis.
Abstract:Recent advancements in skin lesion classification models have significantly improved accuracy, with some models even surpassing dermatologists' diagnostic performance. However, in medical practice, distrust in AI models remains a challenge. Beyond high accuracy, trustworthy, explainable diagnoses are essential. Existing explainability methods have reliability issues, with LIME-based methods suffering from inconsistency, while CAM-based methods failing to consider all classes. To address these limitations, we propose Global Class Activation Probabilistic Map Evaluation, a method that analyses all classes' activation probability maps probabilistically and at a pixel level. By visualizing the diagnostic process in a unified manner, it helps reduce the risk of misdiagnosis. Furthermore, the application of SafeML enhances the detection of false diagnoses and issues warnings to doctors and patients as needed, improving diagnostic reliability and ultimately patient safety. We evaluated our method using the ISIC datasets with MobileNetV2 and Vision Transformers.
Abstract:Fairness is a critical component of Trustworthy AI. In this paper, we focus on Machine Learning (ML) and the performance of model predictions when dealing with skin color. Unlike other sensitive attributes, the nature of skin color differs significantly. In computer vision, skin color is represented as tensor data rather than categorical values or single numerical points. However, much of the research on fairness across sensitive groups has focused on categorical features such as gender and race. This paper introduces a new technique for evaluating fairness in ML for image classification tasks, specifically without the use of annotation. To address the limitations of prior work, we handle tensor data, like skin color, without classifying it rigidly. Instead, we convert it into probability distributions and apply statistical distance measures. This novel approach allows us to capture fine-grained nuances in fairness both within and across what would traditionally be considered distinct groups. Additionally, we propose an innovative training method to mitigate the latent biases present in conventional skin tone categorization. This method leverages color distance estimates calculated through Bayesian regression with polynomial functions, ensuring a more nuanced and equitable treatment of skin color in ML models.