Abstract:Ultrasound is widely used in clinical practice due to its portability, cost-effectiveness, safety, and real-time imaging capabilities. However, image acquisition and interpretation remain highly operator dependent, motivating the development of robust AI-assisted analysis methods. Vision-language models (VLMs) have recently demonstrated strong multimodal reasoning capabilities and competitive performance in medical image analysis, including ultrasound. However, emerging evidence highlights significant concerns about their trustworthiness. In particular, adversarial robustness is critical because Med-VLMs operate via natural-language instructions, rendering prompt formulation a realistic and practically exploitable point of vulnerability. Small variations (typos, shorthand, underspecified requests, or ambiguous wording) can meaningfully shift model outputs. We propose a scalable adversarial evaluation framework that leverages a large language model (LLM) to generate clinically plausible adversarial prompt variants via "humanized" rewrites and minimal edits that mimic routine clinical communication. Using ultrasound multiple-choice question answering benchmarks, we systematically assess the vulnerability of SOTA Med-VLMs to these attacks, examine how attacker LLM capacity influences attack success, analyze the relationship between attack success and model confidence, and identify consistent failure patterns across models. Our results highlight realistic robustness gaps that must be addressed for safe clinical translation. Code will be released publicly following the review process.




Abstract:Training can improve human decision-making performance. After several training sessions, a person can quickly and accurately complete a task. However, decision-making is always a trade-off between accuracy and response time. Factors such as age and drug abuse can affect the decision-making process. This study examines how training can improve the performance of different age groups in completing a random dot motion (RDM) task. The participants are divided into two groups: old and young. They undergo a three-phase training and then repeat the same RDM task. The hierarchical drift-diffusion model analyzes the subjects' responses and determines how the model's parameters change after training for both age groups. The results show that after training, the participants were able to accumulate sensory information faster, and the model drift rate increased. However, their decision boundary decreased as they became more confident and had a lower decision-making threshold. Additionally, the old group had a higher boundary and lower drift rate in both pre and post-training, and there was less difference between the two group parameters after training.