Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Varun Mishra

From 'What' to 'How' and 'Why': Sharing LLM-Generated Retrospective Summaries of Older Adults' Passive Tracking Data with Remote Family Members

Jun 02, 2026

Jiachen Li, Reina Szeyi Chan, Akshat Choube, Xiang Zhi Tan, Elizabeth Mynatt, Varun Mishra

Abstract:With the growing prevalence of modern ubiquitous computing technologies, multi-modal tracking systems hold promise for providing timely awareness and reassurance to stakeholders such as remote family members (RFMs) of older adults, who play a central role in care coordination. However, combining heterogeneous data streams into high-level, meaningful content - such as retrospective summaries - remains challenging. While recent work has demonstrated the promise of large language models (LLMs) for interpreting multi-modal tracking data, less attention has been given to generating narrative accounts for stakeholders like RFMs, who possess rich personal knowledge of older adults and strong emotional responsibility, yet have limited visibility into their daily lives and limited capacity for caregiving. In this work, we explore how LLMs can be used to generate retrospective summaries from multi-modal tracking data for RFMs of older adults. We leveraged and customized an existing system, Vital Insight, to generate initial summaries on different dates and data availability scenarios as technology probes, and conducted interviews with 11 RFMs to gather feedback. Based on these insights, we redesigned the system into a multi-layer, multi-agent, insight-driven summary approach that builds from objective statistics and descriptions to enriched, context-aware narratives. We then compared the redesigned summaries with the initial versions through a survey with the same 11 RFMs and found significant improvements in satisfaction, perceived helpfulness, trust, and willingness to receive the summaries. We conclude by presenting design implications for AI-generated summaries for RFMs and broader contexts, emphasizing the need to support RFMs' sensemaking shift from simply presenting ''What'' data were collected, to explaining ''How'' is my loved one doing and ''Why''.

Via

Access Paper or Ask Questions

Detecting Drunk Driving Using Off-the-Shelf Smartwatches

May 22, 2026

Robin Deuber, Lanlan Yang, Michal Bechny, Christoph Heck, Matthias Pfäffli, Matthias Bantle, Florian von Wangenheim, Elgar Fleisch, Wolfgang Weinmann, Manuel Günther(+2 more)

Abstract:Alcohol-impaired driving remains a major yet preventable cause of road traffic injury and death, with many drivers underestimating their level of intoxication. Compared to in-vehicle systems, mobile drunk-driving detection using consumer smartwatches offers a scalable way to trigger preventive interventions and increase awareness without additional in-vehicle hardware. We introduce a system that leverages wrist accelerometer data and heart rate variability-derived physiological signals to detect alcohol-related driving impairment. We collected data in a randomized, controlled three-arm test-track study (n=54) and trained both logistic regression models with window-aggregated features and a two-tower 1D convolutional neural network (CNN), to detect alcohol-impaired driving. The CNN achieved a participant-averaged area under the receiver operating characteristic (AUROC) of 0.88 for detecting any alcohol intoxication and 0.86 for detecting driving above the WHO-recommended limit of 0.05 g/dL. To the best of our knowledge, this is the first work to (1) demonstrate drunk-driving detection using consumer smartwatches, (2) develop and evaluate such a system in a real vehicle on a closed test track, and (3) rigorously assess generalization to unseen participants. Together, these findings highlight the potential of wearable-based sensing to support scalable, measurement-driven prevention of alcohol-related traffic harm.

* 27 pages, 7 figures

Via

Access Paper or Ask Questions

Extending Stress Detection Reproducibility to Consumer Wearable Sensors

May 09, 2025

Ohida Binte Amin, Varun Mishra, Tinashe M. Tapera, Robert Volpe, Aarti Sathyanarayana

Figure 1 for Extending Stress Detection Reproducibility to Consumer Wearable Sensors

Figure 2 for Extending Stress Detection Reproducibility to Consumer Wearable Sensors

Figure 3 for Extending Stress Detection Reproducibility to Consumer Wearable Sensors

Figure 4 for Extending Stress Detection Reproducibility to Consumer Wearable Sensors

Abstract:Wearable sensors are widely used to collect physiological data and develop stress detection models. However, most studies focus on a single dataset, rarely evaluating model reproducibility across devices, populations, or study conditions. We previously assessed the reproducibility of stress detection models across multiple studies, testing models trained on one dataset against others using heart rate (with R-R interval) and electrodermal activity (EDA). In this study, we extended our stress detection reproducibility to consumer wearable sensors. We compared validated research-grade devices, to consumer wearables - Biopac MP160, Polar H10, Empatica E4, to the Garmin Forerunner 55s, assessing device-specific stress detection performance by conducting a new stress study on undergraduate students. Thirty-five students completed three standardized stress-induction tasks in a lab setting. Biopac MP160 performed the best, being consistent with our expectations of it as the gold standard, though performance varied across devices and models. Combining heart rate variability (HRV) and EDA enhanced stress prediction across most scenarios. However, Empatica E4 showed variability; while HRV and EDA improved stress detection in leave-one-subject-out (LOSO) evaluations (AUROC up to 0.953), device-specific limitations led to underperformance when tested with our pre-trained stress detection tool (AUROC 0.723), highlighting generalizability challenges related to hardware-model compatibility. Garmin Forerunner 55s demonstrated strong potential for real-world stress monitoring, achieving the best mental arithmetic stress detection performance in LOSO (AUROC up to 0.961) comparable to research-grade devices like Polar H10 (AUROC 0.954), and Empatica E4 (AUROC 0.905 with HRV-only model and AUROC 0.953 with HRV+EDA model), with the added advantage of consumer-friendly wearability for free-living contexts.

* Accepted at IEEE EMBC 2025

Via

Access Paper or Ask Questions

Imputation Matters: A Deeper Look into an Overlooked Step in Longitudinal Health and Behavior Sensing Research

Dec 08, 2024

Akshat Choube, Rahul Majethia, Sohini Bhattacharya, Vedant Das Swain, Jiachen Li, Varun Mishra

Figure 1 for Imputation Matters: A Deeper Look into an Overlooked Step in Longitudinal Health and Behavior Sensing Research

Figure 2 for Imputation Matters: A Deeper Look into an Overlooked Step in Longitudinal Health and Behavior Sensing Research

Figure 3 for Imputation Matters: A Deeper Look into an Overlooked Step in Longitudinal Health and Behavior Sensing Research

Figure 4 for Imputation Matters: A Deeper Look into an Overlooked Step in Longitudinal Health and Behavior Sensing Research

Abstract:Longitudinal passive sensing studies for health and behavior outcomes often have missing and incomplete data. Handling missing data effectively is thus a critical data processing and modeling step. Our formative interviews with researchers working in longitudinal health and behavior passive sensing revealed a recurring theme: most researchers consider imputation a low-priority step in their analysis and inference pipeline, opting to use simple and off-the-shelf imputation strategies without comprehensively evaluating its impact on study outcomes. Through this paper, we call attention to the importance of imputation. Using publicly available passive sensing datasets for depression, we show that prioritizing imputation can significantly impact the study outcomes -- with our proposed imputation strategies resulting in up to 31% improvement in AUROC to predict depression over the original imputation strategy. We conclude by discussing the challenges and opportunities with effective imputation in longitudinal sensing studies.

Via

Access Paper or Ask Questions

Vital Insight: Assisting Experts' Sensemaking Process of Multi-modal Personal Tracking Data Using Visualization and LLM

Oct 18, 2024

Jiachen Li, Justin Steinberg, Xiwen Li, Akshat Choube, Bingsheng Yao, Dakuo Wang, Elizabeth Mynatt, Varun Mishra

Figure 1 for Vital Insight: Assisting Experts' Sensemaking Process of Multi-modal Personal Tracking Data Using Visualization and LLM

Figure 2 for Vital Insight: Assisting Experts' Sensemaking Process of Multi-modal Personal Tracking Data Using Visualization and LLM

Figure 3 for Vital Insight: Assisting Experts' Sensemaking Process of Multi-modal Personal Tracking Data Using Visualization and LLM

Figure 4 for Vital Insight: Assisting Experts' Sensemaking Process of Multi-modal Personal Tracking Data Using Visualization and LLM

Abstract:Researchers have long recognized the socio-technical gaps in personal tracking research, where machines can never fully model the complexity of human behavior, making it only able to produce basic rule-based outputs or "black-box" results that lack clear explanations. Real-world deployments rely on experts for this complex translation from sparse data to meaningful insights. In this study, we consider this translation process from data to insights by experts as "sensemaking" and explore how HCI researchers can support it through Vital Insight, an evidence-based 'sensemaking' system that combines direct representation and indirect inference through visualization and Large Language Models. We evaluate Vital Insight in user testing sessions with 14 experts in multi-modal tracking, synthesize design implications, and develop an expert sensemaking model where they iteratively move between direct data representations and AI-supported inferences to explore, retrieve, question, and validate insights.

Via

Access Paper or Ask Questions

SeSaMe: A Framework to Simulate Self-Reported Ground Truth for Mental Health Sensing Studies

Mar 27, 2024

Akshat Choube, Vedant Das Swain, Varun Mishra

Figure 1 for SeSaMe: A Framework to Simulate Self-Reported Ground Truth for Mental Health Sensing Studies

Figure 2 for SeSaMe: A Framework to Simulate Self-Reported Ground Truth for Mental Health Sensing Studies

Figure 3 for SeSaMe: A Framework to Simulate Self-Reported Ground Truth for Mental Health Sensing Studies

Figure 4 for SeSaMe: A Framework to Simulate Self-Reported Ground Truth for Mental Health Sensing Studies

Abstract:Advances in mobile and wearable technologies have enabled the potential to passively monitor a person's mental, behavioral, and affective health. These approaches typically rely on longitudinal collection of self-reported outcomes, e.g., depression, stress, and anxiety, to train machine learning (ML) models. However, the need to continuously self-report adds a significant burden on the participants, often resulting in attrition, missing labels, or insincere responses. In this work, we introduce the Scale Scores Simulation using Mental Models (SeSaMe) framework to alleviate participants' burden in digital mental health studies. By leveraging pre-trained large language models (LLMs), SeSaMe enables the simulation of participants' responses on psychological scales. In SeSaMe, researchers can prompt LLMs with information on participants' internal behavioral dispositions, enabling LLMs to construct mental models of participants to simulate their responses on psychological scales. We demonstrate an application of SeSaMe, where we use GPT-4 to simulate responses on one scale using responses from another as behavioral information. We also evaluate the alignment between human and SeSaMe-simulated responses to psychological scales. Then, we present experiments to inspect the utility of SeSaMe-simulated responses as ground truth in training ML models by replicating established depression and anxiety screening tasks from a previous study. Our results indicate SeSaMe to be a promising approach, but its alignment may vary across scales and specific prediction objectives. We also observed that model performance with simulated data was on par with using the real data for training in most evaluation scenarios. We conclude by discussing the potential implications of SeSaMe in addressing some challenges researchers face with ground-truth collection in passive sensing studies.

Via

Access Paper or Ask Questions