Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stefan K. Ehrlich

Lightweight Structured Multimodal Reasoning for Clinical Scene Understanding in Robotics

Sep 26, 2025

Saurav Jha, Stefan K. Ehrlich

Figure 1 for Lightweight Structured Multimodal Reasoning for Clinical Scene Understanding in Robotics

Figure 2 for Lightweight Structured Multimodal Reasoning for Clinical Scene Understanding in Robotics

Figure 3 for Lightweight Structured Multimodal Reasoning for Clinical Scene Understanding in Robotics

Figure 4 for Lightweight Structured Multimodal Reasoning for Clinical Scene Understanding in Robotics

Abstract:Healthcare robotics requires robust multimodal perception and reasoning to ensure safety in dynamic clinical environments. Current Vision-Language Models (VLMs) demonstrate strong general-purpose capabilities but remain limited in temporal reasoning, uncertainty estimation, and structured outputs needed for robotic planning. We present a lightweight agentic multimodal framework for video-based scene understanding. Combining the Qwen2.5-VL-3B-Instruct model with a SmolAgent-based orchestration layer, it supports chain-of-thought reasoning, speech-vision fusion, and dynamic tool invocation. The framework generates structured scene graphs and leverages a hybrid retrieval module for interpretable and adaptive reasoning. Evaluations on the Video-MME benchmark and a custom clinical dataset show competitive accuracy and improved robustness compared to state-of-the-art VLMs, demonstrating its potential for applications in robot-assisted surgery, patient monitoring, and decision support.

* 11 pages, 3 figures

Via

Access Paper or Ask Questions

Panoptica -- instance-wise evaluation of 3D semantic and instance segmentation maps

Dec 05, 2023

Florian Kofler, Hendrik Möller, Josef A. Buchner, Ezequiel de la Rosa, Ivan Ezhov, Marcel Rosier, Isra Mekki, Suprosanna Shit, Moritz Negwer, Rami Al-Maskari(+11 more)

Figure 1 for Panoptica -- instance-wise evaluation of 3D semantic and instance segmentation maps

Figure 2 for Panoptica -- instance-wise evaluation of 3D semantic and instance segmentation maps

Figure 3 for Panoptica -- instance-wise evaluation of 3D semantic and instance segmentation maps

Figure 4 for Panoptica -- instance-wise evaluation of 3D semantic and instance segmentation maps

Abstract:This paper introduces panoptica, a versatile and performance-optimized package designed for computing instance-wise segmentation quality metrics from 2D and 3D segmentation maps. panoptica addresses the limitations of existing metrics and provides a modular framework that complements the original intersection over union-based panoptic quality with other metrics, such as the distance metric Average Symmetric Surface Distance. The package is open-source, implemented in Python, and accompanied by comprehensive documentation and tutorials. panoptica employs a three-step metrics computation process to cover diverse use cases. The efficacy of panoptica is demonstrated on various real-world biomedical datasets, where an instance-wise evaluation is instrumental for an accurate representation of the underlying clinical task. Overall, we envision panoptica as a valuable tool facilitating in-depth evaluation of segmentation methods.

* 15 pages, 6 figures, 3 tables

Via

Access Paper or Ask Questions