Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Topic:Eye Tracking

Deconstructing sentence disambiguation by joint latent modeling of reading paradigms: LLM surprisal is not enough

Feb 04, 2026

Dario Paape, Tal Linzen, Shravan Vasishth

Abstract:Using temporarily ambiguous garden-path sentences ("While the team trained the striker wondered ...") as a test case, we present a latent-process mixture model of human reading behavior across four different reading paradigms (eye tracking, uni- and bidirectional self-paced reading, Maze). The model distinguishes between garden-path probability, garden-path cost, and reanalysis cost, and yields more realistic processing cost estimates by taking into account trials with inattentive reading. We show that the model is able to reproduce empirical patterns with regard to rereading behavior, comprehension question responses, and grammaticality judgments. Cross-validation reveals that the mixture model also has better predictive fit to human reading patterns and end-of-trial task data than a mixture-free model based on GPT-2-derived surprisal values. We discuss implications for future work.

Via

Access Paper or Ask Questions

Act, Sense, Act: Learning Non-Markovian Active Perception Strategies from Large-Scale Egocentric Human Data

Feb 04, 2026

Jialiang Li, Yi Qiao, Yunhan Guo, Changwen Chen, Wenzhao Lian

Abstract:Achieving generalizable manipulation in unconstrained environments requires the robot to proactively resolve information uncertainty, i.e., the capability of active perception. However, existing methods are often confined in limited types of sensing behaviors, restricting their applicability to complex environments. In this work, we formalize active perception as a non-Markovian process driven by information gain and decision branching, providing a structured categorization of visual active perception paradigms. Building on this perspective, we introduce CoMe-VLA, a cognitive and memory-aware vision-language-action (VLA) framework that leverages large-scale human egocentric data to learn versatile exploration and manipulation priors. Our framework integrates a cognitive auxiliary head for autonomous sub-task transitions and a dual-track memory system to maintain consistent self and environmental awareness by fusing proprioceptive and visual temporal contexts. By aligning human and robot hand-eye coordination behaviors in a unified egocentric action space, we train the model progressively in three stages. Extensive experiments on a wheel-based humanoid have demonstrated strong robustness and adaptability of our proposed method across diverse long-horizon tasks spanning multiple active perception scenarios.

Via

Access Paper or Ask Questions

VRGaussianAvatar: Integrating 3D Gaussian Avatars into VR

Feb 02, 2026

Hail Song, Boram Yoon, Seokhwan Yang, Seoyoung Kang, Hyunjeong Kim, Henning Metzmacher, Woontack Woo

Abstract:We present VRGaussianAvatar, an integrated system that enables real-time full-body 3D Gaussian Splatting (3DGS) avatars in virtual reality using only head-mounted display (HMD) tracking signals. The system adopts a parallel pipeline with a VR Frontend and a GA Backend. The VR Frontend uses inverse kinematics to estimate full-body pose and streams the resulting pose along with stereo camera parameters to the backend. The GA Backend stereoscopically renders a 3DGS avatar reconstructed from a single image. To improve stereo rendering efficiency, we introduce Binocular Batching, which jointly processes left and right eye views in a single batched pass to reduce redundant computation and support high-resolution VR displays. We evaluate VRGaussianAvatar with quantitative performance tests and a within-subject user study against image- and video-based mesh avatar baselines. Results show that VRGaussianAvatar sustains interactive VR performance and yields higher perceived appearance similarity, embodiment, and plausibility. Project page and source code are available at https://vrgaussianavatar.github.io.

* Accepted as an IEEE TVCG paper at IEEE VR 2026 (journal track)

Via

Access Paper or Ask Questions

GAZELOAD A Multimodal Eye-Tracking Dataset for Mental Workload in Industrial Human-Robot Collaboration

Jan 29, 2026

Bsher Karbouj, Baha Eddin Gaaloul, Jorg Kruger

Abstract:This article describes GAZELOAD, a multimodal dataset for mental workload estimation in industrial human-robot collaboration. The data were collected in a laboratory assembly testbed where 26 participants interacted with two collaborative robots (UR5 and Franka Emika Panda) while wearing Meta ARIA smart glasses. The dataset time-synchronizes eye-tracking signals (pupil diameter, fixations, saccades, eye gaze, gaze transition entropy, fixation dispersion index) with environmental real-time and continuous measurements (illuminance) and task and robot context (bench, task block, induced faults), under controlled manipulations of task difficulty and ambient conditions. For each participant and workload-graded task block, we provide CSV files with ocular metrics aggregated into 250 ms windows, environmental logs, and self-reported mental workload ratings on a 1-10 Likert scale, organized in participant-specific folders alongside documentation. These data can be used to develop and benchmark algorithms for mental workload estimation, feature extraction, and temporal modeling in realistic industrial HRC scenarios, and to investigate the influence of environmental factors such as lighting on eye-based workload markers.

Via

Access Paper or Ask Questions

mmWave Sensing for Detecting Movement Through Thermoplastic Masks During Radiation Therapy Treatment

Jan 31, 2026

Ali Kourani, Naveed A. Abbasi, Syeda Narjis Fatima, Katsuyuki Haneda, Andreas F. Molisch

Abstract:Precision in radiation therapy relies on immobilization systems that limit patient motion. Thermoplastic masks are commonly used for this purpose, but subtle voluntary and involuntary movements such as jaw shifts, deep breathing, or eye squinting may still compromise treatment accuracy. Existing motion tracking methods are limited: optical systems require a clear line of sight and only detect surface motion, while X-ray-based tracking introduces additional ionizing radiation. This study explores the use of low-power, non-ionizing millimeter-wave (mmWave) sensing for through-mask motion detection. We characterize the RF properties of thermoplastic mask material in the 28-38 GHz range and perform motion detection using a 1 GHz bandwidth centered at 28 GHz. We use a frequency-domain system with horn antennas in a custom-built anechoic chamber to capture changes in the amplitude and phase of transmitted RF waves in response to subtle head and facial movements. These findings lay groundwork for future real-time through-mask motion tracking and future integration with multi-antenna systems and machine learning for error correction during radiotherapy.

* 5 pages, 6 figures. Conference paper

Via

Access Paper or Ask Questions

Bifocal Attention: Harmonizing Geometric and Spectral Positional Embeddings for Algorithmic Generalization

Jan 29, 2026

Kanishk Awadhiya

Abstract:Rotary Positional Embeddings (RoPE) have become the standard for Large Language Models (LLMs) due to their ability to encode relative positions through geometric rotation. However, we identify a significant limitation we term ''Spectral Rigidity'': standard RoPE utilizes a fixed geometric decay ($θ^{-i}$) optimized for local syntactic coherence, which fails to capture the long-range, periodic structures inherent in recursive logic and algorithmic reasoning. This results in a ''Structure Gap'', where models trained on shallow reasoning chains fail to extrapolate to deeper recursive steps. In this work, we introduce Bifocal Attention, an architectural paradigm that decouples positional encoding into two distinct modalities: Geometric Eyes (Standard RoPE) for precise token-level manipulation, and Spectral Eyes (Learnable Harmonic Operators) for tracking long-range recursive depth. We propose a novel training protocol, Spectral Evolution, which initializes positional frequencies as static geometric parameters but allows them to evolve via gradient descent into a harmonic basis optimized for the specific algorithmic topology of the task.

Via

Access Paper or Ask Questions

Gaze Prediction in Virtual Reality Without Eye Tracking Using Visual and Head Motion Cues

Jan 26, 2026

Christos Petrou, Harris Partaourides, Athanasios Balomenos, Yannis Kopsinis, Sotirios Chatzis

Abstract:Gaze prediction plays a critical role in Virtual Reality (VR) applications by reducing sensor-induced latency and enabling computationally demanding techniques such as foveated rendering, which rely on anticipating user attention. However, direct eye tracking is often unavailable due to hardware limitations or privacy concerns. To address this, we present a novel gaze prediction framework that combines Head-Mounted Display (HMD) motion signals with visual saliency cues derived from video frames. Our method employs UniSal, a lightweight saliency encoder, to extract visual features, which are then fused with HMD motion data and processed through a time-series prediction module. We evaluate two lightweight architectures, TSMixer and LSTM, for forecasting future gaze directions. Experiments on the EHTask dataset, along with deployment on commercial VR hardware, show that our approach consistently outperforms baselines such as Center-of-HMD and Mean Gaze. These results demonstrate the effectiveness of predictive gaze modeling in reducing perceptual lag and enhancing natural interaction in VR environments where direct eye tracking is constrained.

Via

Access Paper or Ask Questions

GuideAI: A Real-time Personalized Learning Solution with Adaptive Interventions

Jan 28, 2026

Ananya Shukla, Chaitanya Modi, Satvik Bajpai, Siddharth Siddharth

Abstract:Large Language Models (LLMs) have emerged as powerful learning tools, but they lack awareness of learners' cognitive and physiological states, limiting their adaptability to the user's learning style. Contemporary learning techniques primarily focus on structured learning paths, knowledge tracing, and generic adaptive testing but fail to address real-time learning challenges driven by cognitive load, attention fluctuations, and engagement levels. Building on findings from a formative user study (N=66), we introduce GuideAI, a multi-modal framework that enhances LLM-driven learning by integrating real-time biosensory feedback including eye gaze tracking, heart rate variability, posture detection, and digital note-taking behavior. GuideAI dynamically adapts learning content and pacing through cognitive optimizations (adjusting complexity based on learning progress markers), physiological interventions (breathing guidance and posture correction), and attention-aware strategies (redirecting focus using gaze analysis). Additionally, GuideAI supports diverse learning modalities, including text-based, image-based, audio-based, and video-based instruction, across varied knowledge domains. A preliminary study (N = 25) assessed GuideAI's impact on knowledge retention and cognitive load through standardized assessments. The results show statistically significant improvements in both problem-solving capability and recall-based knowledge assessments. Participants also experienced notable reductions in key NASA-TLX measures including mental demand, frustration levels, and effort, while simultaneously reporting enhanced perceived performance. These findings demonstrate GuideAI's potential to bridge the gap between current LLM-based learning systems and individualized learner needs, paving the way for adaptive, cognition-aware education at scale.

* Accepted for publication at the 31st International Conference on Intelligent User Interfaces (IUI 2026)

Via

Access Paper or Ask Questions

NeuroManip: Prosthetic Hand Manipulation System Based on EMG and Eye Tracking Powered by the Neuromorphic Processor AltAi

Jan 25, 2026

Roman Akinshin, Elizaveta Lopatina, Kirill Bogatikov, Nikolai Kiz, Anna V. Makarova, Mikhail Lebedev, Miguel Altamirano Cabrera, Dzmitry Tsetserukou, Valerii Kangler

Abstract:This paper presents a novel neuromorphic control architecture for upper-limb prostheses that combines surface electromyography (sEMG) with gaze-guided computer vision. The system uses a spiking neural network deployed on the neuromorphic processor AltAi to classify EMG patterns in real time while an eye-tracking headset and scene camera identify the object within the user's focus. In our prototype, the same EMG recognition model that was originally developed for a conventional GPU is deployed as a spiking network on AltAi, achieving comparable accuracy while operating in a sub-watt power regime, which enables a lightweight, wearable implementation. For six distinct functional gestures recorded from upper-limb amputees, the system achieves robust recognition performance comparable to state-of-the-art myoelectric interfaces. When the vision pipeline restricts the decision space to three context-appropriate gestures for the currently viewed object, recognition accuracy increases to roughly 95% while excluding unsafe, object-inappropriate grasps. These results indicate that the proposed neuromorphic, context-aware controller can provide energy-efficient and reliable prosthesis control and has the potential to improve safety and usability in everyday activities for people with upper-limb amputation.

* This paper has been accepted for publication at LBR of HRI 2026 conference

Via

Access Paper or Ask Questions

Eye-Tracking-Driven Control in Daily Task Assistance for Assistive Robotic Arms

Jan 24, 2026

Anke Fischer-Janzen, Thomas M. Wendt, Kristof Van Laerhoven

Abstract:Shared control improves Human-Robot Interaction by reducing the user's workload and increasing the robot's autonomy. It allows robots to perform tasks under the user's supervision. Current eye-tracking-driven approaches face several challenges. These include accuracy issues in 3D gaze estimation and difficulty interpreting gaze when differentiating between multiple tasks. We present an eye-tracking-driven control framework, aimed at enabling individuals with severe physical disabilities to perform daily tasks independently. Our system uses task pictograms as fiducial markers combined with a feature matching approach that transmits data of the selected object to accomplish necessary task related measurements with an eye-in-hand configuration. This eye-tracking control does not require knowledge of the user's position in relation to the object. The framework correctly interpreted object and task selection in up to 97.9% of measurements. Issues were found in the evaluation, that were improved and shared as lessons learned. The open-source framework can be adapted to new tasks and objects due to the integration of state-of-the-art object detection models.

* 23 pages, 6 figures, publication in review process

Via

Access Paper or Ask Questions

Topic:Eye Tracking

Papers and Code