Abstract:Augmented Reality (AR) systems, while enhancing task performance through real-time guidance, pose risks of inducing cognitive tunneling-a hyperfocus on virtual content that compromises situational awareness (SA) in safety-critical scenarios. This paper investigates SA in AR-guided cardiopulmonary resuscitation (CPR), where responders must balance effective compressions with vigilance to unpredictable hazards (e.g., patient vomiting). We developed an AR app on a Magic Leap 2 that overlays real-time CPR feedback (compression depth and rate) and conducted a user study with simulated unexpected incidents (e.g., bleeding) to evaluate SA, in which SA metrics were collected via observation and questionnaires administered during freeze-probe events. Eye tracking analysis revealed that higher SA levels were associated with greater saccadic amplitude and velocity, and with reduced proportion and frequency of fixations on virtual content. To predict SA, we propose FixGraphPool, a graph neural network that structures gaze events (fixations, saccades) into spatiotemporal graphs, effectively capturing dynamic attentional patterns. Our model achieved 83.0% accuracy (F1=81.0%), outperforming feature-based machine learning and state-of-the-art time-series models by leveraging domain knowledge and spatial-temporal information encoded in ET data. These findings demonstrate the potential of eye tracking for SA modeling in AR and highlight its utility in designing AR systems that ensure user safety and situational awareness.
Abstract:The virtual content in augmented reality (AR) can introduce misleading or harmful information, leading to semantic misunderstandings or user errors. In this work, we focus on visual information manipulation (VIM) attacks in AR where virtual content changes the meaning of real-world scenes in subtle but impactful ways. We introduce a taxonomy that categorizes these attacks into three formats: character, phrase, and pattern manipulation, and three purposes: information replacement, information obfuscation, and extra wrong information. Based on the taxonomy, we construct a dataset, AR-VIM. It consists of 452 raw-AR video pairs spanning 202 different scenes, each simulating a real-world AR scenario. To detect such attacks, we propose a multimodal semantic reasoning framework, VIM-Sense. It combines the language and visual understanding capabilities of vision-language models (VLMs) with optical character recognition (OCR)-based textual analysis. VIM-Sense achieves an attack detection accuracy of 88.94% on AR-VIM, consistently outperforming vision-only and text-only baselines. The system reaches an average attack detection latency of 7.07 seconds in a simulated video processing framework and 7.17 seconds in a real-world evaluation conducted on a mobile Android AR application.
Abstract:We consider distributed machine learning (ML) through unmanned aerial vehicles (UAVs) for geo-distributed device clusters. We propose five new technologies/techniques: (i) stratified UAV swarms with leader, worker, and coordinator UAVs, (ii) hierarchical nested personalized federated learning (HN-PFL): a holistic distributed ML framework for personalized model training across the worker-leader-core network hierarchy, (iii) cooperative UAV resource pooling for distributed ML using the UAVs' local computational capabilities, (iv) aerial data caching and relaying for efficient data relaying to conduct ML, and (v) concept/model drift, capturing online data variations at the devices. We split the UAV-enabled model training problem as two parts. (a) Network-aware HN-PFL, where we optimize a tradeoff between energy consumption and ML model performance by configuring data offloading among devices-UAVs and UAV-UAVs, UAVs' CPU frequencies, and mini-batch sizes subject to communication/computation network heterogeneity. We tackle this optimization problem via the method of posynomial condensation and propose a distributed algorithm with a performance guarantee. (b) Macro-trajectory and learning duration design, which we formulate as a sequential decision making problem, tackled via deep reinforcement learning. Our simulations demonstrate the superiority of our methodology with regards to the distributed ML performance, the optimization of network resources, and the swarm trajectory efficiency.