Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marcel Walch

OR-Action: Multi-Role Video Understanding with Fine-Grained Actions

Jun 11, 2026

Felix Tristram, Ege Özsoy, Christian Benz, Marcel Walch, Ghazal Ghazaei, Nassir Navab

Abstract:Fine-grained understanding of operating room (OR) activity could enable workflow-aware assistance, yet remains difficult due to clutter, occlusions, and limited sensing. The prevailing approach to model this environment is scene graphs as an interpretable representation of OR interactions. Converting their frame-wise relational predictions into temporally extended, fine-grained actions however, is challenging without explicit temporal modeling. To enable a principled temporal evaluation of current OR understanding methods, we introduce the first action-centric benchmark built on a publicly available ego-exocentric OR dataset by defining a fine-grained, multi-role action taxonomy and generating dense action segments via distillation from ground-truth scene graph state changes. Experiments on this benchmark show that current scene graph prediction methods struggle to model temporal structure, even when adding explicit modeling through Graph Neural Networks. We therefore introduce a vision-only temporal model that outperforms graph-based methods significantly when using all available egocentric video as input. Building on this model we also introduce a novel multi- to single-view feature alignment strategy that improves single-view performance on multi-role action recognition, mitigating the need for extensive egocentric video capture. Benchmark and code will be released upon acceptance.

Via

Access Paper or Ask Questions

The PUEVA Inventory: A Toolkit to Evaluate the Personality, Usability and Enjoyability of Voice Agents

Dec 20, 2021

Stacey Li, Sven Krome, Ilan Mandel, Marcel Walch, Wendy Ju

Figure 1 for The PUEVA Inventory: A Toolkit to Evaluate the Personality, Usability and Enjoyability of Voice Agents

Abstract:The proliferation of voice agents in consumer devices requires new tools for evaluating these systems beyond their technical functionality. This paper presents a toolkit for the evaluation of Voice User Interfaces (VUIs) with the intention of measuring the crucial factors of subjective enjoyment in the user experience. The PUEVA toolkit was constructed using a meta-analysis of existing literature, structured N=20 and semi-structured N=18 interviews and a within subjects lab study. The resulting questionnaire contains 35 items that represent 12 scales in three categories: (1) Personality (2) Usablity and (3) Enjoyability. The PUEVA Toolkit moves us towards the capacity to evaluate and compare subjective, joyful experiences in between-subject as well as within-subject research designs.

Via

Access Paper or Ask Questions