Abstract:Vision--Language Models (VLMs) show significant promise for Medical Visual Question Answering (VQA), yet their deployment in clinical settings is hindered by severe vulnerability to adversarial attacks. Standard adversarial training, while effective for simpler tasks, often degrades both generalization performance and the quality of generated clinical reasoning. We introduce SafeMed-R1, a hybrid defense framework that ensures robust performance while preserving high-quality, interpretable medical reasoning. SafeMed-R1 employs a two-stage approach: at training time, we integrate Adversarial Training with Group Relative Policy Optimization (AT-GRPO) to explicitly robustify the reasoning process against worst-case perturbations; at inference time, we augment the model with Randomized Smoothing to provide certified $L_2$-norm robustness guarantees. We evaluate SafeMed-R1 on the OmniMedVQA benchmark across eight medical imaging modalities comprising over 88,000 samples. Our experiments reveal that standard fine-tuned VLMs, despite achieving 95\% accuracy on clean inputs, collapse to approximately 25\% under PGD attacks. In contrast, SafeMed-R1 maintains 84.45\% accuracy under the same adversarial conditions, representing a 59 percentage point improvement in robustness. Furthermore, we demonstrate that models trained with explicit chain-of-thought reasoning exhibit superior adversarial robustness compared to instruction-only variants, suggesting a synergy between interpretability and security in medical AI systems.
Abstract:Stunting detection is a significant issue in Indonesian healthcare, causing lower cognitive function, lower productivity, a weakened immunity, delayed neuro-development, and degenerative diseases. In regions with a high prevalence of stunting and limited welfare resources, identifying children in need of treatment is critical. The diagnostic process often raises challenges, such as the lack of experience in medical workers, incompatible anthropometric equipment, and inefficient medical bureaucracy. To counteract the issues, the use of load cell sensor and ultrasonic sensor can provide suitable anthropometric equipment and streamline the medical bureaucracy for stunting detection. This paper also employs machine learning for stunting detection based on sensor readings. The experiment results show that the sensitivity of the load cell sensor and the ultrasonic sensor is 0.9919 and 0.9986, respectively. Also, the machine learning test results have three classification classes, which are normal, stunted, and stunting with an accuracy rate of 98\%.